Azure OpenAI Embedding Pricing 2026 (May 2026)
Per-token pricing for text-embedding-3-small, 3-large, and ada-002 on Azure OpenAI Service, plus how PTU reserved capacity changes the math and the key differences against OpenAI direct.
Current Pricing (Pay-As-You-Go)
| Model | Standard $/M | Batch $/M | Dims | MTEB |
|---|---|---|---|---|
text-embedding-3-small Recommended | $0.02 | not offered | 1536 | 62.3 |
text-embedding-3-large High accuracy | $0.13 | not offered | 3072 | 64.6 |
text-embedding-ada-002 Legacy - migrate | $0.10 | not offered | 1536 | 60.5 |
Pay-as-you-go rates verified May 2026. The Batch API 50% discount available on OpenAI direct is not offered on Azure OpenAI for embedding models.
Cost at scale (pay-as-you-go)
| Model | 100M tokens | 1B tokens | 10B tokens |
|---|---|---|---|
| text-embedding-3-small | $2 | $20 | $200 |
| text-embedding-3-large | $13 | $130 | $1,300 |
| text-embedding-ada-002 | $10 | $100 | $1,000 |
Provisioned Throughput Units (PTUs)
PTUs are Azure's reserved-capacity pricing model. You commit to a fixed throughput allocation for a monthly or annual term in exchange for predictable pricing, guaranteed capacity, and protection against pay-as-you-go list-price changes.
Embedding workloads typically need fewer PTUs than chat-completion workloads because embedding requests are short and high-throughput. The exact discount versus pay-as-you-go depends on region, term length, and any Enterprise Agreement-level negotiation. Public list reference points have historically broken even with pay-as-you-go at roughly 200-300 million tokens per month for embeddings, but Microsoft does not publish a single canonical PTU price for embeddings, so you must request a quote.
When PTU makes sense: sustained high-throughput indexing (catalogue ingestion, search-index refresh), latency-sensitive online retrieval that needs guaranteed capacity, or regulated workloads where pay-as-you-go variance is a procurement blocker. For variable or bursty workloads, pay-as-you-go is almost always cheaper.
Azure OpenAI versus OpenAI direct: key differences
| Aspect | OpenAI direct | Azure OpenAI |
|---|---|---|
| Pay-as-you-go $/M tokens | $0.02 - $0.13 | $0.02 - $0.13 (matches) |
| Batch API 50% discount | Yes | No |
| Reserved capacity | Not offered for embeddings | PTU (monthly / annual term) |
| Regional control | Global only | 25+ regions, data residency |
| Enterprise compliance | SOC 2, GDPR | SOC 2, GDPR, HIPAA, FedRAMP, more via inheritance |
| Private networking | Public endpoint only | Private Endpoints, VNet |
| Volume discounts | Limited (Enterprise tier) | EA-level discounts on Azure commit |
Choosing between Azure and OpenAI direct
Pick OpenAI direct if your workload is mostly bulk indexing (Batch API saves 50%), you have no data-residency requirement, and you want the lowest absolute cost.
Pick Azure OpenAI if you need EU / UK / US data residency, you are already on an Azure Enterprise Agreement (committed Azure spend reduces effective embedding cost), you need Private Endpoints / VNet, or you require HIPAA / FedRAMP compliance.
Pick Azure PTU if you have sustained, predictable embedding throughput above roughly 200-300 million tokens per month and your latency requirements rule out the Batch API anyway. Get a Microsoft account team quote before assuming PTU is cheaper - the answer is region- and term-dependent.