Independent resource. Not affiliated with any provider. Always verify pricing on provider sites.
$embeddingcost

Embedding Cost: Real Prices from OpenAI, Cohere, Voyage, Google & AWS

The independent, engineer-grade reference for embedding model pricing. Not a vendor page, not a blog post. A calculator, comparison tables, and real-world RAG worked examples.

Prices verified May 2026

Quick Cost Estimate

Live

~20,000 documents at 500 tokens each

cheapestBGE-M3 (self-hosted)Self-Hosted
$0.0100
text-embedding-3-smallOpenAI
$0.2000
voyage-3.5Voyage AI
$0.6000
embed-v4Cohere
$1.00
text-embedding-3-largeOpenAI
$1.30
gemini-embedding-001Google
$1.50
gemini-embedding-2-previewGoogle
$2.00
Titan Text Embeddings V2Amazon Bedrock
$2.00

Cheapest option for 10.0M tokens: BGE-M3 (self-hosted) at $0.001/M tokens.Full calculator with storage

Key Pricing Landmarks

OpenAI small (batch)
Cheapest commercial API
$0.01/M
Voyage 3.5
Best accuracy-to-price ratio
$0.06/M
Self-hosted BGE-M3
Cheapest above ~15M tokens/mo
$0.001/M
Pinecone storage
Recurring forever - plan for it
$0.33/GB/mo

Quick Decision Guide

-
Budget-first, small corpus: OpenAI small or Voyage lite
-
Best accuracy for RAG: Voyage 3.5 or voyage-3-large
-
Multilingual content: Cohere embed-v4
-
AWS/enterprise compliance: Amazon Titan V2 via Bedrock
-
Over 15M tokens/month: Self-hosted BGE-M3
Featured deep-dive
OpenAI Embedding Pricing: text-embedding-3-small, 3-large, ada-002

Per-model rates ($0.02/M small, $0.13/M large, $0.10/M ada-002), 50% Batch API discount math, Azure OpenAI differences, Matryoshka dimensions, and migration paths.

Read →
ModelProvider$/M tokensBatch $/MDimsMTEB
text-embedding-3-smallOpenAI$0.020$0.0101,53662.3Details
text-embedding-3-largeOpenAI$0.130$0.0653,07264.6Details
embed-v4Cohere$0.100-1,02455Details
voyage-3.5Voyage AI$0.060$0.0401,02467.1Details
voyage-3-largeVoyage AI$0.180$0.1202,04868.9Details
voyage-3-liteVoyage AI$0.020$0.01351261.7Details
gemini-embedding-2-previewGoogle$0.200-3,07268Details
gemini-embedding-001Google$0.150-3,07265.4Details
Titan Text Embeddings V2Amazon Bedrock$0.200-1,02462.8Details
BGE-M3 (self-hosted)Self-Hosted$0.001$0.0011,02466.5Details

Prices as of May 2026. Green = cheapest tier, red = most expensive tier. MTEB Retrieval average scores where publicly available.

How Embedding Pricing Works

Embedding models convert text into numeric vectors. Providers charge for the tokens you send in (input-only pricing) - there are no output tokens to pay for. This makes embeddings 10-100x cheaper than chat completions at similar token volumes.

The formula is simple: cost = (tokens / 1,000,000) x rate. Where engineers get surprised is the Batch API discount (50% off for OpenAI, 33% off for Voyage) which can halve your indexing bill overnight.

The other surprise is storage cost. Generating 1B embeddings with OpenAI small costs $20 once. Storing those vectors in a managed vector DB can cost $200+/month forever. For long-lived corpora, the recurring storage bill dwarfs the one-time generation cost within a few months.

Chunk overlap also inflates token counts 10-25% over the raw text size. A 1GB corpus is typically 750M raw tokens but 830-940M billed tokens after overlapping chunking. Plan for it.

Cost at Common Scales

ScaleOAI smallVoyage 3.5Self-hosted
10M tokens$0.20$0.60$0.01
100M tokens$2$6$0.10
1B tokens$20$60$1
10B tokens$200$600$10

Self-hosted cost is estimated for BGE-M3 on a spot A100. Excludes DevOps overhead. Break-even analysis

Go Deeper

Frequently Asked Questions

How much do OpenAI embeddings cost?
OpenAI text-embedding-3-small costs $0.02 per million tokens (standard) or $0.01 per million via the Batch API. text-embedding-3-large costs $0.13/M standard, $0.065/M batch. For 50 million tokens, that is $1.00 on small standard or $0.50 on batch. The legacy ada-002 model costs $0.10/M with no batch discount.
Which embedding model is cheapest?
Self-hosted BGE-M3 on a spot A100 GPU costs roughly $0.001 per million tokens - cheapest overall, but requires infrastructure management. Among commercial APIs, OpenAI text-embedding-3-small batch at $0.01/M is the cheapest option. Voyage-3-lite at $0.02/M standard and OpenAI small standard at $0.02/M are the next cheapest commercial options.
Does OpenAI charge for output tokens on embeddings?
No. Embedding models only charge for input tokens - the text you send in. There are no output tokens because embeddings return a numeric vector, not generated text. This is why embeddings are 10 to 100 times cheaper per token than chat completions.
What does it cost to store embeddings?
Storage cost follows the formula: vectors x dimensions x 4 bytes per vector. For 1 million 1,536-dimension vectors, that is about 5.7 GB. On Pinecone serverless, expect roughly $1.90 per month for storage alone. For large corpora, the monthly storage bill often exceeds the one-time embedding generation cost within a few months. Use pgvector on an existing Postgres database for near-zero marginal storage cost.
How do I reduce my embedding bill?
The highest-impact techniques are: (1) Use the Batch API - 50% off OpenAI, 33% off Voyage for bulk indexing. (2) Pick the smaller model - OpenAI small vs large is 6.5x cheaper for roughly 3% quality reduction. (3) Use Matryoshka dimensions - reduce from 3,072 to 1,536 dims on text-embedding-3-large to halve storage cost. (4) Cache query embeddings - many queries repeat. (5) De-duplicate before embedding. See the full optimization guide.
Is it cheaper to self-host embedding models?
Yes, above roughly 15 million tokens per month. A spot A100 GPU at $1.50 per hour can process about 8,000 tokens per second with BGE-M3, costing roughly $0.001 per million tokens. Self-hosting has fixed overhead costs though - a full-time GPU month costs $1,080 at spot rates even with zero usage. The break-even point depends on your specific GPU rate and utilization.
What is a word embedding and how is it different from document embedding?
Word embeddings (Word2Vec, GloVe) represent individual words as vectors. Modern embedding models from OpenAI, Cohere, and Voyage embed entire sentences, paragraphs, or documents into a single vector capturing semantic meaning. The pricing question for AI engineers is almost always about document or chunk embeddings for RAG, not word embeddings - which are typically open-source and free to run.
How does Azure OpenAI embedding pricing differ from OpenAI direct?
Azure OpenAI matches OpenAI direct per-token rates for embedding models as of April 2026. The key differences: Azure adds Provisioned Throughput Units (PTU) for reserved capacity at enterprise scale, the Batch API discount is not available on Azure, and pricing may vary by Azure region.
Disclaimer: This site is independent and not affiliated with OpenAI, Cohere, Voyage AI, Anthropic, Google, Amazon Web Services, or any other provider listed. Pricing is compiled from public sources and verified monthly. Always confirm current pricing on the provider's own site before committing to architecture decisions.

Updated May 2026