What is Matryoshka Representation Learning and how does it affect OpenAI embedding cost?

MRL allows you to use a truncated version of the embedding vector. For text-embedding-3-large, you can request 1536 dimensions instead of 3072 using the 'dimensions' parameter in the API call. The API price per token is unchanged, but you store half the bytes per vector, halving your downstream storage cost. The quality loss is minimal - roughly 1-2 MTEB points.

OpenAI Embedding Pricing: text-embedding-3-small, 3-large & ada-002 (June 2026)

Q: Can I use OpenAI's Batch API for embeddings?

Yes. The OpenAI Batch API offers a 50% discount on embedding costs in exchange for up to 24-hour processing time. It is ideal for bulk indexing operations where latency is not critical. Batch requests are submitted as JSONL files and results are retrieved when complete.

Q: How do I count tokens for OpenAI embeddings?

Use the tiktoken library to count tokens before sending to the API. For English text, roughly 1 token per 4 characters or 0.75 tokens per word. A 500-word chunk is approximately 375-400 tokens. Add 10-25% for chunk overlap in RAG pipelines. OpenAI's playground also shows token counts in real time.

Q: What are OpenAI's rate limits for embedding models?

Rate limits depend on your usage tier. Tier 1 (after first $5 spend): 500 requests per minute, 1 million tokens per minute. Tier 2 (after $50 spend, 7+ days): 5,000 RPM, 5M TPM. Tier 5 (after $500+ spend): 10,000 RPM, 10M TPM. Batch API requests have their own separate quota.

Everything about OpenAI's embedding model pricing, verified monthly. Standard tier, Batch API discounts, Azure differences, and worked examples at 100M, 1B, and 10B tokens.

Verified June 2026

How much do OpenAI embeddings cost?

OpenAI embeddings cost $0.02 per million tokens for text-embedding-3-small and $0.13 per million tokens for text-embedding-3-large. The legacy text-embedding-ada-002 is $0.10/M. The Batch API halves the price of the v3 models (24-hour turnaround); Azure OpenAI matches the standard per-token rates but has no batch discount.

3-small

$0.02/M

$0.01/M batch

3-large

$0.13/M

$0.065/M batch

ada-002

$0.10/M

no batch tier

Current Pricing

Model	Standard $/M	Batch $/M	Savings	Dims	MTEB
text-embedding-3-small Recommended	$0.02	$0.010	50% off	1,536 (min 512)	62.3
text-embedding-3-large High accuracy	$0.13	$0.065	50% off	3,072 (min 256)	64.6
text-embedding-ada-002 Legacy - migrate	$0.10	N/A	-	1,536	60.5

Source: OpenAI pricing page, verified June 2026. Batch API uses async processing with up to 24-hour turnaround.

text-embedding-3-small

The recommended model for most RAG applications. At $0.02/M tokens standard ($0.01/M batch), it offers the best price-to-performance ratio in OpenAI's lineup. Default output is 1,536 dimensions, with Matryoshka support allowing reduction to as few as 512 dimensions via the dimensions parameter.

Volume	Standard cost	Batch cost	Batch saving
100M tokens	$2	$1	$1 saved
1B tokens	$20	$10	$10 saved
10B tokens	$200	$100	$100 saved

text-embedding-3-large

At $0.13/M tokens standard, text-embedding-3-large is 6.5x more expensive than small. The MTEB Retrieval benchmark advantage is roughly 2-3 percentage points. In practical RAG testing, this translates to marginally better recall on domain-specific technical content and non-English text - rarely worth the 6.5x price premium for most applications.

Where large pays off: extremely accuracy-sensitive applications (medical, legal, financial), very long documents where the 3072-dimension space captures more nuance, and multilingual corpora with rare language pairs. The MRL feature is more valuable here - reduce to 1536 dims and halve your storage cost with negligible quality loss.

text-embedding-ada-002 (Legacy)

Migrate away from ada-002. It costs $0.10/M tokens - 5x more expensive than text-embedding-3-small, which also outperforms it on MTEB benchmarks. ada-002 has no batch API support and no MRL dimensionality control.

If you have an existing index built with ada-002 and want to migrate: the re-embedding cost is your current token count at $0.02/M using 3-small. For 1B tokens, that is $20 for the migration and your ongoing bill drops from $0.10/M to $0.02/M - paid back in 3 months at the same volume.

Batch API: 50% Off for Indexing Workloads

The OpenAI Batch API processes embedding requests asynchronously with a 24-hour completion window. In exchange, you pay 50% of the standard rate. This makes it ideal for initial corpus indexing, nightly re-indexing of updated documents, and any embedding job where you can tolerate overnight latency.

Good fit for Batch API

- Initial corpus indexing (one-time)
- Nightly batch re-indexing
- Large document archive processing
- Any job where 24-hour latency is fine

Not suitable for Batch API

- Real-time query embedding
- New document ingestion pipelines
- Anything user-facing
- Low-latency search applications

Azure OpenAI Embedding Pricing

Azure OpenAI matches OpenAI direct per-token rates for embedding models as of June 2026. Key differences from using OpenAI direct:

-No Batch API discount on Azure. The 50% batch rate is only available through OpenAI direct API, not Azure OpenAI.
-Provisioned Throughput Units (PTU) available for reserved capacity at enterprise scale.
-Regional pricing variation. Prices may differ slightly by Azure region. Check the Azure pricing calculator for your deployment region.
-Enterprise agreements. Volume discounts available via Microsoft Enterprise Agreements - relevant for large organizations already on Microsoft contracts.

Gotchas That Inflate Your Bill

Chunk overlap

A 25% overlap on 500-token chunks means you embed 125 extra tokens per chunk. For 1M chunks, that is 125M extra tokens - an extra $2.50 at small standard rates. Use tiktoken to count before you send.

Rate limit retries

Hitting rate limits causes exponential backoff and retry loops. If your retry logic is naive, you can send the same tokens 3-5 times before getting a 200. Implement proper idempotent retry with token-level deduplication.

Re-embedding on model upgrade

Switching from ada-002 to 3-small or from 3-small to 3-large requires re-embedding your entire corpus. Factor this one-time cost into your model-selection decision. For 10B tokens, a re-embedding to 3-small costs $200.

Free credit expiry

New accounts get $5 in free credits - roughly 250M tokens on 3-small. These expire after 3 months. If you are prototyping slowly, set a reminder to use them before expiry.

Frequently Asked Questions

How much does OpenAI text-embedding-3-small cost?

text-embedding-3-small costs $0.02 per million tokens on the standard API. Using the Batch API, the price drops to $0.01 per million tokens - a 50% discount in exchange for up to 24-hour processing time.

What is the difference between text-embedding-3-small and text-embedding-3-large?

text-embedding-3-small produces 1536-dimension vectors and costs $0.02 per million tokens. text-embedding-3-large produces 3072-dimension vectors and costs $0.13 per million tokens - 6.5x more expensive. The MTEB score difference is roughly 2-3 percentage points. For most RAG applications, the small model is the better choice.

Should I still use text-embedding-ada-002?

No. ada-002 costs $0.10 per million tokens - 5x more expensive than text-embedding-3-small, which also outperforms it on MTEB benchmarks. The only reason to keep ada-002 is if you already have a large index and want to avoid a re-embedding pass. Otherwise, migrate to text-embedding-3-small.

Can I use OpenAI's Batch API for embeddings?

Yes. The Batch API offers a 50% discount on embedding costs in exchange for up to 24-hour processing time. Ideal for bulk indexing operations where latency is not critical. Batch requests are submitted as JSONL files and results are retrieved when complete.

How does Azure OpenAI embedding pricing compare to OpenAI direct?

Azure OpenAI matches OpenAI direct per-token rates as of June 2026. Key differences: the Batch API discount is not available on Azure, Azure offers Provisioned Throughput Units for reserved capacity, and pricing can vary by Azure region.

What is Matryoshka Representation Learning and how does it affect cost?

MRL allows you to request truncated embedding vectors. For text-embedding-3-large, you can request 1536 dimensions instead of 3072 using the 'dimensions' parameter. The API price per token is unchanged, but you store half the bytes per vector, halving downstream storage cost. Quality loss is roughly 1-2 MTEB points.

How do I count tokens for OpenAI embeddings?

Use the tiktoken library. For English text, roughly 1 token per 4 characters or 0.75 tokens per word. A 500-word chunk is approximately 375-400 tokens. Add 10-25% for chunk overlap in RAG pipelines.

What are OpenAI's rate limits for embedding models?

Tier 1 (after first $5 spend): 500 requests per minute, 1 million tokens per minute. Tier 2 (after $50 spend, 7+ days): 5,000 RPM, 5M TPM. Tier 5 (after $500+ spend): 10,000 RPM, 10M TPM. Batch API has separate quotas.

Full cost calculator

Compare OpenAI vs all providers with storage

Compare all models

Side-by-side vs Cohere, Voyage, Google

Cost optimization

Batch API, MRL, chunking tips

Disclaimer: This site is independent and not affiliated with OpenAI. Pricing is compiled from OpenAI's public pricing page and verified monthly. Always confirm current pricing at openai.com/api/pricing before committing to architecture decisions.