OpenAI Embedding Pricing: text-embedding-3-small, 3-large & ada-002 (May 2026)
Everything about OpenAI's embedding model pricing, verified monthly. Standard tier, Batch API discounts, Azure differences, and worked examples at 100M, 1B, and 10B tokens.
Current Pricing
| Model | Standard $/M | Batch $/M | Savings | Dims | MTEB |
|---|---|---|---|---|---|
text-embedding-3-small Recommended | $0.02 | $0.010 | 50% off | 1,536 (min 512) | 62.3 |
text-embedding-3-large High accuracy | $0.13 | $0.065 | 50% off | 3,072 (min 256) | 64.6 |
text-embedding-ada-002 Legacy - migrate | $0.10 | N/A | - | 1,536 | 60.5 |
Source: OpenAI pricing page, verified May 2026. Batch API uses async processing with up to 24-hour turnaround.
text-embedding-3-small
The recommended model for most RAG applications. At $0.02/M tokens standard ($0.01/M batch), it offers the best price-to-performance ratio in OpenAI's lineup. Default output is 1,536 dimensions, with Matryoshka support allowing reduction to as few as 512 dimensions via the dimensions parameter.
| Volume | Standard cost | Batch cost | Batch saving |
|---|---|---|---|
| 100M tokens | $2 | $1 | $1 saved |
| 1B tokens | $20 | $10 | $10 saved |
| 10B tokens | $200 | $100 | $100 saved |
text-embedding-3-large
At $0.13/M tokens standard, text-embedding-3-large is 6.5x more expensive than small. The MTEB Retrieval benchmark advantage is roughly 2-3 percentage points. In practical RAG testing, this translates to marginally better recall on domain-specific technical content and non-English text - rarely worth the 6.5x price premium for most applications.
Where large pays off: extremely accuracy-sensitive applications (medical, legal, financial), very long documents where the 3072-dimension space captures more nuance, and multilingual corpora with rare language pairs. The MRL feature is more valuable here - reduce to 1536 dims and halve your storage cost with negligible quality loss.
text-embedding-ada-002 (Legacy)
Migrate away from ada-002. It costs $0.10/M tokens - 5x more expensive than text-embedding-3-small, which also outperforms it on MTEB benchmarks. ada-002 has no batch API support and no MRL dimensionality control.
If you have an existing index built with ada-002 and want to migrate: the re-embedding cost is your current token count at $0.02/M using 3-small. For 1B tokens, that is $20 for the migration and your ongoing bill drops from $0.10/M to $0.02/M - paid back in 3 months at the same volume.
Batch API: 50% Off for Indexing Workloads
The OpenAI Batch API processes embedding requests asynchronously with a 24-hour completion window. In exchange, you pay 50% of the standard rate. This makes it ideal for initial corpus indexing, nightly re-indexing of updated documents, and any embedding job where you can tolerate overnight latency.
- - Initial corpus indexing (one-time)
- - Nightly batch re-indexing
- - Large document archive processing
- - Any job where 24-hour latency is fine
- - Real-time query embedding
- - New document ingestion pipelines
- - Anything user-facing
- - Low-latency search applications
Azure OpenAI Embedding Pricing
Azure OpenAI matches OpenAI direct per-token rates for embedding models as of April 2026. Key differences from using OpenAI direct:
- -No Batch API discount on Azure. The 50% batch rate is only available through OpenAI direct API, not Azure OpenAI.
- -Provisioned Throughput Units (PTU) available for reserved capacity at enterprise scale.
- -Regional pricing variation. Prices may differ slightly by Azure region. Check the Azure pricing calculator for your deployment region.
- -Enterprise agreements. Volume discounts available via Microsoft Enterprise Agreements - relevant for large organizations already on Microsoft contracts.
Gotchas That Inflate Your Bill
A 25% overlap on 500-token chunks means you embed 125 extra tokens per chunk. For 1M chunks, that is 125M extra tokens - an extra $2.50 at small standard rates. Use tiktoken to count before you send.
Hitting rate limits causes exponential backoff and retry loops. If your retry logic is naive, you can send the same tokens 3-5 times before getting a 200. Implement proper idempotent retry with token-level deduplication.
Switching from ada-002 to 3-small or from 3-small to 3-large requires re-embedding your entire corpus. Factor this one-time cost into your model-selection decision. For 10B tokens, a re-embedding to 3-small costs $200.
New accounts get $5 in free credits - roughly 250M tokens on 3-small. These expire after 3 months. If you are prototyping slowly, set a reminder to use them before expiry.