Five years ago, “training data” meant a stack of CSVs and a Mechanical Turk budget. Today it is a multi-vendor market with specialist firms for image, video, RLHF, expert review, synthetic generation, and red-team adversarial sets. Buyers routinely under-budget — not because the unit prices are surprising, but because they miss the second-order costs that dominate the total.
Below are the rates we see in 2026, the hidden costs that always show up, and the cost-quality trade-offs that decide whether the data is actually usable.
Unit prices by data type
These are the typical ranges from labelling vendors in 2026. Quoted in USD per unit. Discounts of 20–40% apply at scale (100k+ items).
| Data type | Task | Typical price |
|---|---|---|
| Image | Bounding box (single class) | $0.04–$0.12 |
| Image | Polygon segmentation | $0.30–$0.80 |
| Image | Instance segmentation, complex scene | $1.50–$4.00 |
| Video | Frame-by-frame bounding box (per sec) | $0.20–$0.60 |
| 3D / LiDAR | Cuboid annotation, point cloud | $1.20–$3.50 |
| Text | Single-label classification | $0.05–$0.15 |
| Text | Named entity recognition (NER) | $0.15–$0.40 |
| Text | Multi-step reasoning judgment | $0.80–$3.00 |
| Audio | Transcription, per minute | $0.80–$2.50 |
| Audio | Speaker diarisation, per minute | $2.00–$5.00 |
| RLHF | Preference pair (response A vs B) | $0.40–$2.00 |
| RLHF | Expert preference (medical, legal, code) | $5.00–$20.00 |
| Synthetic data | Per high-quality sample (generated + filtered) | $0.02–$0.20 |
What the unit prices do not show you
The unit price is a fraction of the total cost. Here is where the rest goes:
Schema design and pilot iteration (10–20% of budget)
Every labelling project that succeeds starts with a small pilot — a few hundred items labelled by 2–3 annotators, reviewed, refined, re-labelled. The schema usually changes twice before it stabilises. This work happens before the “real” budget kicks in and is the most common reason projects run over.
QA and adjudication (15–25% of budget)
Single-pass labels are not usable for training. Best-practice flows use 2–3 independent labellers per item, automated agreement scoring, adjudication by a senior reviewer for disagreements, and a recurring sample review by your own SME. Budget at least 1.5× the headline labelling cost for a usable dataset.
Subject-matter expert review (often the largest single line)
For regulated or specialist domains — medical, legal, finance, code, scientific — generalist annotators cannot produce defensible labels. Expert hourly rates in 2026 sit at $80–$300, depending on jurisdiction and specialty. A 50,000-item medical annotation project with appropriate expert oversight will spend more on SME time than on the labels themselves.
Data security, sovereignty, and on-shore handling
If your data contains PII, PHI, or other regulated content, vendor selection narrows sharply. SOC 2, ISO 27001, on-shore data residency, and background-checked workforces add 30–80% to base rates. Tools that allow annotation without raw data leaving your environment (federated annotation, screen-share-only flows) carry similar premiums.
Edge-case enrichment
The first 80% of any dataset is cheap. The long tail of edge cases — rare classes, ambiguous boundaries, adversarial examples — is where models actually fail in production. Targeted edge-case labelling typically costs 5–10× the headline unit price because it requires active learning loops, synthetic generation, or human-curated query construction.
RLHF and preference data: a separate market
Preference data for RLHF, DPO, or instruction tuning is its own economy. Pricing is dominated by who is doing the labelling.
Generalist preference labelling (“which of these two responses is better?”) at $0.40–$2.00 per pair works for general assistant tuning. The moment the domain narrows — medical reasoning, legal accuracy, code correctness, safety-critical outputs — the cost climbs fast because the labellers need domain credentials and meaningful capacity to evaluate the responses.
Three takeaways for budgeting RLHF:
- Plan for at least 5,000–20,000 preference pairs for a meaningful tune. Below 1,000, the signal is too noisy.
- For domain-specific preference data, expert costs dominate; expect $40,000–$200,000 for a serious tune.
- Reuse the same labellers across rounds — annotator consistency matters more than the absolute label.
Synthetic data: cheap to generate, expensive to validate
Synthetic data generation has become a mainstream tactic — particularly for rare classes, privacy-sensitive domains, and adversarial scenarios. Per sample, synthetic data is cheap: $0.02–$0.20 for high-quality generated instances after filtering.
The cost shifts to validation. A synthetic dataset that has not been carefully validated against a real-world holdout will silently teach your model the shape of the generator, not the shape of the real distribution. Plan to spend roughly half what you save on generation, on validation infrastructure: real-data anchors, distribution-shift tests, downstream performance comparisons.
Vendor selection in 2026
The market has consolidated around three vendor archetypes:
- Scaled platforms (Scale, Surge, Sama, Labelbox). Best for high-volume, mainstream tasks (image, text, RLHF). Mature tools, predictable quality, predictable price. The wrong choice for highly regulated or boutique-domain work.
- Expert-network firms (newer entrants, including Mercor-style marketplaces). Direct access to credentialed SMEs for medical, legal, code, and scientific labelling. More expensive per hour but radically better quality. The right choice when you need defensibility.
- In-house labelling teams. Increasingly common for long-running programs with stable schemas and IP-sensitive data. Higher upfront cost (tooling, hiring, management) but the unit cost falls below vendor rates within 12–18 months at sufficient volume.
The honest total-cost framework
For any serious training-data engagement in 2026, budget against this framework:
- Labelling unit cost: 35–50% of total
- QA, adjudication, and SME review: 25–35%
- Tooling, integration, security: 10–15%
- Schema design, pilot, and rework: 10–15%
- Active-learning loops and edge-case targeting: 5–10%
A budget that allocates only the first line item — “labelling cost” — and treats the rest as overhead is the budget most likely to blow up. Plan properly, and labelling becomes the most predictable line item in an AI program. Plan poorly, and it becomes the most expensive.