How much does Azure OpenAI text-embedding-3-small cost?

Azure OpenAI text-embedding-3-small costs $0.02 per million tokens on pay-as-you-go (standard) pricing, matching OpenAI direct as of the verification date on this page. Provisioned Throughput Units (PTUs) offer a discount in exchange for reserved capacity and a minimum monthly commitment.

Does Azure OpenAI offer the Batch API discount on embeddings?

No. As of 2026 the 50% Batch API discount available on OpenAI direct is not offered for embedding models on Azure OpenAI. This is the single biggest pricing difference between the two platforms. Bulk indexing workloads see a meaningful cost premium on Azure unless they qualify for PTU reserved-capacity pricing.

What are Provisioned Throughput Units (PTUs) for Azure OpenAI embeddings?

PTUs are Azure's reserved-capacity pricing model. You commit to a fixed throughput allocation for a monthly or yearly term in exchange for predictable pricing and guaranteed capacity. Embedding workloads typically need fewer PTUs than chat completion workloads because embedding requests are short and high-throughput. Microsoft does not publish a single canonical PTU price for embedding models, so the pay-as-you-go break-even point depends on the quoted rate, region, term length, and any Enterprise Agreement-level negotiation. Request a quote from your Microsoft account team before assuming PTU is cheaper.

Which Azure regions support text-embedding-3-small and text-embedding-3-large?

Embedding model availability varies by Azure region and changes regularly. Microsoft publishes a live model availability page that lists current regions for each embedding model - always check that page before architecting a deployment, because regional gaps cause real engineering pain when discovered after the fact. Tier-1 commercial regions in the US and Europe tend to lead in model availability; data-residency-restricted regions tend to lag.

Does Azure OpenAI charge for data egress?

Azure OpenAI itself charges per token, with no separate egress fee for API responses. However, if your embedding service sits behind a VNet and traffic routes through Private Endpoints or Azure Firewall, those services have their own per-GB costs that apply to the response payload (which is small for embeddings - typically 6KB per vector at 1536 dimensions in float32). Self-hosted vector databases pulling embeddings across regions also incur standard Azure inter-region transfer fees.

Is Azure OpenAI cheaper than OpenAI direct for embeddings?

Per-token pay-as-you-go pricing is identical between the two as of 2026. OpenAI direct is cheaper in absolute terms because of the 50% Batch API discount, which Azure does not offer. Azure becomes cheaper only if you commit to PTUs and run high consistent volume. For most teams under 1 billion tokens per month, OpenAI direct on the Batch API is the lowest-cost option.

Does Azure OpenAI support Matryoshka dimension truncation for text-embedding-3-large?

Yes. The 'dimensions' parameter that lets you request 1536-dimension output from text-embedding-3-large (instead of the default 3072) works identically on Azure. The per-token price is unchanged but you halve your downstream storage cost. Quality loss is minimal - 1-2 MTEB points.

What is the deployment model for Azure OpenAI embeddings?

You deploy each embedding model as a 'deployment' within an Azure OpenAI resource. Each deployment has its own endpoint URL and capacity allocation. You can create multiple deployments of the same model to isolate workloads, apply different rate limits, or split costs across cost centres. Deployments are free to create - you only pay for tokens consumed.

Azure OpenAI Embedding Pricing 2026 (June 2026)

Per-token pricing for text-embedding-3-small, 3-large, and ada-002 on Azure OpenAI Service, plus how PTU reserved capacity changes the math and the key differences against OpenAI direct.

Verified June 2026

Direct answer

Azure OpenAI embedding pricing in 2026 is pay-as-you-go and matches OpenAI direct per token: text-embedding-3-small costs $0.02 per million tokens, text-embedding-3-large $0.13 per million, and the legacy text-embedding-ada-002 $0.10 per million. The one pricing difference that matters: Azure does not offer the 50% Batch API discount that OpenAI direct gives on embeddings, so bulk indexing is cheaper on OpenAI direct unless you commit to Azure PTU reserved capacity.

$0.02

text-embedding-3-small

per million tokens

$0.13

text-embedding-3-large

per million tokens

$0.10

text-embedding-ada-002

per million tokens

Current Pricing (Pay-As-You-Go)

Model	Standard $/M	Batch $/M	Dims	MTEB
text-embedding-3-small Recommended	$0.02	not offered	1536	62.3
text-embedding-3-large High accuracy	$0.13	not offered	3072	64.6
text-embedding-ada-002 Legacy - migrate	$0.10	not offered	1536	60.5

Pay-as-you-go rates verified June 2026. The Batch API 50% discount available on OpenAI direct is not offered on Azure OpenAI for embedding models.

Cost at scale (pay-as-you-go)

Model	100M tokens	1B tokens	10B tokens
text-embedding-3-small	$2	$20	$200
text-embedding-3-large	$13	$130	$1,300
text-embedding-ada-002	$10	$100	$1,000

Provisioned Throughput Units (PTUs)

PTUs are Azure's reserved-capacity pricing model. You commit to a fixed throughput allocation for a monthly or annual term in exchange for predictable pricing, guaranteed capacity, and protection against pay-as-you-go list-price changes.

Embedding workloads typically need fewer PTUs than chat-completion workloads because embedding requests are short and high-throughput. The exact discount versus pay-as-you-go depends on region, term length, and any Enterprise Agreement-level negotiation. Microsoft does not publish a single canonical PTU price for embedding models, so the break-even point against pay-as-you-go varies by quote and is not something we can publish a definitive cross-over number for. Request a quote from your Microsoft account team for your specific region and committed volume.

When PTU makes sense: sustained high-throughput indexing (catalogue ingestion, search-index refresh), latency-sensitive online retrieval that needs guaranteed capacity, or regulated workloads where pay-as-you-go variance is a procurement blocker. For variable or bursty workloads, pay-as-you-go is almost always cheaper.

Azure OpenAI versus OpenAI direct: key differences

Aspect	OpenAI direct	Azure OpenAI
Pay-as-you-go $/M tokens	$0.02 - $0.13	$0.02 - $0.13 (matches)
Batch API 50% discount	Yes	No
Reserved capacity	Not offered for embeddings	PTU (monthly / annual term)
Regional control	Global only	Many Azure regions, data residency
Enterprise compliance	SOC 2, GDPR	SOC 2, GDPR, HIPAA, FedRAMP, more via inheritance
Private networking	Public endpoint only	Private Endpoints, VNet
Volume discounts	Limited (Enterprise tier)	EA-level discounts on Azure commit

Choosing between Azure and OpenAI direct

Pick OpenAI direct if your workload is mostly bulk indexing (Batch API saves 50%), you have no data-residency requirement, and you want the lowest absolute cost.

Pick Azure OpenAI if you need EU / UK / US data residency, you are already on an Azure Enterprise Agreement (committed Azure spend reduces effective embedding cost), you need Private Endpoints / VNet, or you require HIPAA / FedRAMP compliance.

Pick Azure PTU if you have sustained, predictable, high-volume embedding throughput and your latency requirements rule out the Batch API anyway. Get a Microsoft account team quote before assuming PTU is cheaper - Microsoft does not publish a canonical embedding PTU price, so the break-even is quote-dependent.

OpenAI direct pricing All embedding providers Cost calculator Methodology