Independent resource. Not affiliated with any provider. Always verify pricing on provider sites.
$embeddingcost

OpenAI Embedding Pricing: text-embedding-3-small, 3-large & ada-002 (May 2026)

Everything about OpenAI's embedding model pricing, verified monthly. Standard tier, Batch API discounts, Azure differences, and worked examples at 100M, 1B, and 10B tokens.

Verified May 2026

Current Pricing

ModelStandard $/MBatch $/MSavingsDimsMTEB
text-embedding-3-small
Recommended
$0.02$0.01050% off1,536 (min 512)62.3
text-embedding-3-large
High accuracy
$0.13$0.06550% off3,072 (min 256)64.6
text-embedding-ada-002
Legacy - migrate
$0.10N/A-1,53660.5

Source: OpenAI pricing page, verified May 2026. Batch API uses async processing with up to 24-hour turnaround.

text-embedding-3-small

The recommended model for most RAG applications. At $0.02/M tokens standard ($0.01/M batch), it offers the best price-to-performance ratio in OpenAI's lineup. Default output is 1,536 dimensions, with Matryoshka support allowing reduction to as few as 512 dimensions via the dimensions parameter.

VolumeStandard costBatch costBatch saving
100M tokens$2$1$1 saved
1B tokens$20$10$10 saved
10B tokens$200$100$100 saved

text-embedding-3-large

At $0.13/M tokens standard, text-embedding-3-large is 6.5x more expensive than small. The MTEB Retrieval benchmark advantage is roughly 2-3 percentage points. In practical RAG testing, this translates to marginally better recall on domain-specific technical content and non-English text - rarely worth the 6.5x price premium for most applications.

Where large pays off: extremely accuracy-sensitive applications (medical, legal, financial), very long documents where the 3072-dimension space captures more nuance, and multilingual corpora with rare language pairs. The MRL feature is more valuable here - reduce to 1536 dims and halve your storage cost with negligible quality loss.

text-embedding-ada-002 (Legacy)

Migrate away from ada-002. It costs $0.10/M tokens - 5x more expensive than text-embedding-3-small, which also outperforms it on MTEB benchmarks. ada-002 has no batch API support and no MRL dimensionality control.

If you have an existing index built with ada-002 and want to migrate: the re-embedding cost is your current token count at $0.02/M using 3-small. For 1B tokens, that is $20 for the migration and your ongoing bill drops from $0.10/M to $0.02/M - paid back in 3 months at the same volume.

Batch API: 50% Off for Indexing Workloads

The OpenAI Batch API processes embedding requests asynchronously with a 24-hour completion window. In exchange, you pay 50% of the standard rate. This makes it ideal for initial corpus indexing, nightly re-indexing of updated documents, and any embedding job where you can tolerate overnight latency.

Good fit for Batch API
  • - Initial corpus indexing (one-time)
  • - Nightly batch re-indexing
  • - Large document archive processing
  • - Any job where 24-hour latency is fine
Not suitable for Batch API
  • - Real-time query embedding
  • - New document ingestion pipelines
  • - Anything user-facing
  • - Low-latency search applications

Azure OpenAI Embedding Pricing

Azure OpenAI matches OpenAI direct per-token rates for embedding models as of April 2026. Key differences from using OpenAI direct:

  • -No Batch API discount on Azure. The 50% batch rate is only available through OpenAI direct API, not Azure OpenAI.
  • -Provisioned Throughput Units (PTU) available for reserved capacity at enterprise scale.
  • -Regional pricing variation. Prices may differ slightly by Azure region. Check the Azure pricing calculator for your deployment region.
  • -Enterprise agreements. Volume discounts available via Microsoft Enterprise Agreements - relevant for large organizations already on Microsoft contracts.

Gotchas That Inflate Your Bill

Chunk overlap

A 25% overlap on 500-token chunks means you embed 125 extra tokens per chunk. For 1M chunks, that is 125M extra tokens - an extra $2.50 at small standard rates. Use tiktoken to count before you send.

Rate limit retries

Hitting rate limits causes exponential backoff and retry loops. If your retry logic is naive, you can send the same tokens 3-5 times before getting a 200. Implement proper idempotent retry with token-level deduplication.

Re-embedding on model upgrade

Switching from ada-002 to 3-small or from 3-small to 3-large requires re-embedding your entire corpus. Factor this one-time cost into your model-selection decision. For 10B tokens, a re-embedding to 3-small costs $200.

Free credit expiry

New accounts get $5 in free credits - roughly 250M tokens on 3-small. These expire after 3 months. If you are prototyping slowly, set a reminder to use them before expiry.

Frequently Asked Questions

How much does OpenAI text-embedding-3-small cost?
text-embedding-3-small costs $0.02 per million tokens on the standard API. Using the Batch API, the price drops to $0.01 per million tokens - a 50% discount in exchange for up to 24-hour processing time.
What is the difference between text-embedding-3-small and text-embedding-3-large?
text-embedding-3-small produces 1536-dimension vectors and costs $0.02 per million tokens. text-embedding-3-large produces 3072-dimension vectors and costs $0.13 per million tokens - 6.5x more expensive. The MTEB score difference is roughly 2-3 percentage points. For most RAG applications, the small model is the better choice.
Should I still use text-embedding-ada-002?
No. ada-002 costs $0.10 per million tokens - 5x more expensive than text-embedding-3-small, which also outperforms it on MTEB benchmarks. The only reason to keep ada-002 is if you already have a large index and want to avoid a re-embedding pass. Otherwise, migrate to text-embedding-3-small.
Can I use OpenAI's Batch API for embeddings?
Yes. The Batch API offers a 50% discount on embedding costs in exchange for up to 24-hour processing time. Ideal for bulk indexing operations where latency is not critical. Batch requests are submitted as JSONL files and results are retrieved when complete.
How does Azure OpenAI embedding pricing compare to OpenAI direct?
Azure OpenAI matches OpenAI direct per-token rates as of April 2026. Key differences: the Batch API discount is not available on Azure, Azure offers Provisioned Throughput Units for reserved capacity, and pricing can vary by Azure region.
What is Matryoshka Representation Learning and how does it affect cost?
MRL allows you to request truncated embedding vectors. For text-embedding-3-large, you can request 1536 dimensions instead of 3072 using the 'dimensions' parameter. The API price per token is unchanged, but you store half the bytes per vector, halving downstream storage cost. Quality loss is roughly 1-2 MTEB points.
How do I count tokens for OpenAI embeddings?
Use the tiktoken library. For English text, roughly 1 token per 4 characters or 0.75 tokens per word. A 500-word chunk is approximately 375-400 tokens. Add 10-25% for chunk overlap in RAG pipelines.
What are OpenAI's rate limits for embedding models?
Tier 1 (after first $5 spend): 500 requests per minute, 1 million tokens per minute. Tier 2 (after $50 spend, 7+ days): 5,000 RPM, 5M TPM. Tier 5 (after $500+ spend): 10,000 RPM, 10M TPM. Batch API has separate quotas.
Full cost calculator
Compare OpenAI vs all providers with storage
Compare all models
Side-by-side vs Cohere, Voyage, Google
Cost optimization
Batch API, MRL, chunking tips
Disclaimer: This site is independent and not affiliated with OpenAI. Pricing is compiled from OpenAI's public pricing page and verified monthly. Always confirm current pricing at openai.com/api/pricing before committing to architecture decisions.

Updated May 2026