FAQ
Q: What do the different types of annotations such as Single-label Text Classification and Entity Annotation on the Pundi AI Data Platform mean?
A: These are common data annotation tasks used in machine learning, particularly in supervised learning, to prepare labeled datasets for training models.
Single-label Text Classification:
Assigns a single label to a given text input.
Example: Categorizing an email as "Spam" or "Not Spam."
Multi-label Text Classification:
Allows assigning multiple labels to a single text input.
Example: Tagging a movie review as both "Comedy" and "Romance."
Entity Annotation:
Involves identifying and labeling specific entities in a text, such as names, dates, or locations.
Example: Marking "John" as a person and "New York" as a location in "John visited New York."
Single-label Image Classification:
Assigns a single label to an image.
Example: Classifying an image as either "Dog" or "Cat."
Multi-label Image Classification:
Allows assigning multiple labels to a single image.
Example: Tagging an image as both "Beach" and "Sunset."
Bounding Box Annotation:
Involves drawing rectangular boxes around objects in an image and labeling them.
Example: Marking a car in a street image with a bounding box labeled "Car."
Image OCR Annotation:
Focuses on recognizing and extracting text from images.
Example: Extracting the text "Sale 50% Off" from a billboard photo.
Q: Do tasks go through multiple iterations of data labelling?
A: Yes, tasks go through multiple iterations of data labeling, where they are reviewed or annotated by several individuals or AI agents, to ensure the highest levels of accuracy and consistency. Typically, tasks are labeled by one or more annotators, enabling the comparison of results and the identification of any discrepancies. This practice of cross-labeling helps to minimize bias and errors that may arise from individual interpretation.
Once the initial labeling is completed, tasks enter a review phase, where they are further evaluated by data verifiers or reviewers. This additional review step makes the labeling process more reliable and ensures that the final results align with project goals.
Q: Is there any incentive for reviewers to review labeled data correctly? For example, what is stopping a reviewer from rapidly accepting or denying a number of labeled tasks to try and earn as much as possible?
A: The labeled data will be reviewed and verified by data verifiers, who must approve the results before rewards are distributed. This process helps minimize the involvement of "bad actors" in the annotation process and ensures that the results meet quality standards.
There will be incentive for reviewers and they will only get paid when their “decision” is acknowledged by other parties and mechanics.
Q: Where are you currently sourcing the data from?
A: Currently, our data is sourced from a variety of providers, including internally curated datasets and medical datasets supplied by our client as part of the healthcare project. For text datasets, we are conducting experimental tests on data annotation using content generated by AI agents (Truth Terminal). These posts are used in their original, unfiltered form as provided by the agents, which may explain the presence of certain errors or inappropriate language.
Q: How do you expect you'll be able to provide a consistent flow of labeled data to protocols?
A: By addressing two sides of the equation, the demand side (buyers) and the supply side (annotators and reviewers), in the following ways:
Demand Side: Driving Task Publishers (Buyers)
Quality Speaks for Itself:
Core Proposition: Emphasize the high-quality datasets produced by the PUNDI AIDATA platform due to its robust multi-layered review system (peer, AI, and expert validation).
Data Validation Assurance:
Market the platform as offering ready-to-use datasets for AI model training and validation.
Highlight the ⅔ consensus mechanism and expert validation to assure buyers of accuracy and reliability.
Partnership with Other AI Projects:
Collaborate with AI initiatives that rely on high-quality datasets.
Examples: Autonomous vehicles, NLP, robotics, and medical AI applications.
API Integration: Allow seamless integration for direct dataset uploads or requests to/from other AI platforms.
Hugging Face Partnership:
Strategic Synergy:
Hugging Face's vast community of developers and data contributors aligns perfectly with PUNDI AIDATA's value proposition.
Enable Hugging Face contributors to earn extra income by participating in PUNDI’s ecosystem as annotators or reviewers.
How It Works:
Set up a reward-sharing program where datasets contributed to Hugging Face are tokenized and made available for monetization on PUNDI.
Use dual-platform recognition (e.g., co-branding datasets).
Customizable Incentives for Publishers:
Task publishers have full control over reward distribution and task structuring, making the platform flexible for diverse AI projects.
Offer dynamic pricing models for tasks based on complexity and urgency.
Supply Side: Incentivizing Annotators and Reviewers
Gamified Incentive Models:
Introduce achievement levels or badges for annotators and reviewers based on:
Number of tasks completed.
Quality of work (peer and AI-reviewed).
Unlock higher-paying tasks for top performers.
Income Potential:
Market the platform as a side-income opportunity for contributors, with examples like:
Annotators who complete X tasks in Y time earn Z tokens.
Reviewers with top-tier reputations earning bonuses per task.
Demonstrate earning potential compared to similar platforms.
Skill Building:
Provide free training materials or tutorials to help contributors improve their skills (e.g., annotating medical images or reviewing NLP datasets).
Offer certifications for expert reviewers, which can further boost their earnings.
Broad Accessibility:
Ensure tasks are available to contributors of all skill levels:
Entry-Level Tasks: Simpler annotations with lower rewards.
Advanced Tasks: Higher rewards for tasks requiring domain expertise.
Integration with Web3 Ecosystem:
Incentivize contributors by:
Providing staking opportunities with rewards linked to platform tokens.
Offering NFTs or badges for top-performing contributors that can have monetary or platform-specific value.
Community-Building Initiatives:
Host data challenges or hackathons with token prizes to attract and engage contributors.
Encourage contributors to share their work on social media for extra rewards, growing the platform's visibility.
Last updated