How do I calculate embedding costs?

Embedding cost = (total tokens / 1,000,000) x (price per million tokens). For example: 10 million tokens on OpenAI text-embedding-3-small = 10 x $0.02 = $0.20. Remember that embeddings charge input tokens only - there is no output token charge.

What is the cheapest way to build a RAG application?

For a small knowledge base under 100k documents with under 50 queries per day: OpenAI text-embedding-3-small plus pgvector on an existing Postgres database costs under $5 per month. The embedding pass for 100k documents at 500 tokens each is only $1.00. See our customer support bot scenario for a detailed worked example.

Do Cohere and Voyage have free tiers?

Cohere offers 100 API calls per minute on the free tier, suitable for development and prototyping but not production workloads. Voyage AI offers 200 million free tokens on the voyage-4 generation (plus voyage-context-3 and voyage-code-3) - enough to embed a substantial corpus before spending anything. The older voyage-3.x models no longer include the free allocation.

Embedding Cost: Real Prices from OpenAI, Cohere, Voyage, Google & AWS

Q: What does it cost to store embeddings?

Storage cost = vectors x dimensions x 4 bytes per vector. For 1 million 1536-dimension vectors, that is about 5.7 GB. On Pinecone serverless, expect $1.90 per month for storage alone. pgvector on an existing Postgres database is near-zero marginal cost. Managed vector DBs can exceed the one-time embedding generation cost within a few months.

Q: Is it cheaper to self-host embedding models?

Yes, above roughly 15 million embeddings per month. A spot A100 GPU at $1.50 per hour can process about 8,000 tokens per second with BGE-M3, giving roughly $0.001 per million tokens. At that rate, self-hosting breaks even with OpenAI small around 15M tokens per month. One real case study saved $85,000 per year by switching from API to self-hosted at 5-billion-token-per-month scale.

The independent, engineer-grade reference for embedding model pricing. Not a vendor page, not a blog post. A calculator, comparison tables, and real-world RAG worked examples.

Prices verified June 2026

Quick Cost Estimate

Live

Tokens to embed

~20,000 documents at 500 tokens each

Use Batch API (50% off for OpenAI, 33% off for Voyage)

cheapestBGE-M3 (self-hosted)Self-Hosted

$0.0100

text-embedding-3-smallOpenAI

$0.2000

Titan Text Embeddings V2Amazon Bedrock

$0.2000

voyage-4Voyage AI

$0.6000

embed-v4Cohere

$1.20

text-embedding-3-largeOpenAI

$1.30

gemini-embedding-001Google

$1.50

gemini-embedding-2-previewGoogle

$2.00

Cheapest option for 10.0M tokens: BGE-M3 (self-hosted) at $0.001/M tokens.Full calculator with storage

Key Pricing Landmarks

OpenAI small (batch)

Cheapest commercial API

$0.01/M

Voyage 4

Best accuracy-to-price ratio

$0.06/M

Self-hosted BGE-M3

Cheapest above ~15M tokens/mo

$0.001/M

Pinecone storage

Recurring forever - plan for it

$0.33/GB/mo

Quick Decision Guide

Budget-first, small corpus: OpenAI small or voyage-4-lite

Best accuracy for RAG: voyage-4 or voyage-4-large

Multilingual content: Cohere embed-v4

AWS/enterprise compliance: Amazon Titan V2 via Bedrock

Over 15M tokens/month: Self-hosted BGE-M3

Featured deep-dive

OpenAI Embedding Pricing: text-embedding-3-small, 3-large, ada-002

Per-model rates ($0.02/M small, $0.13/M large, $0.10/M ada-002), 50% Batch API discount math, Azure OpenAI differences, Matryoshka dimensions, and migration paths.

Read →

Pricing Cheat Sheet

Full comparison with MTEB scores

Model	Provider	$/M tokens	Batch $/M	Dims	MTEB
text-embedding-3-small	OpenAI	$0.020	$0.010	1,536	62.3	Details
text-embedding-3-large	OpenAI	$0.130	$0.065	3,072	64.6	Details
embed-v4	Cohere	$0.120	-	1,536	55	Details
voyage-4-large	Voyage AI	$0.120	$0.080	1,024	-	Details
voyage-4	Voyage AI	$0.060	$0.040	1,024	-	Details
voyage-4-lite	Voyage AI	$0.020	$0.013	1,024	-	Details
gemini-embedding-2-preview	Google	$0.200	-	3,072	68	Details
gemini-embedding-001	Google	$0.150	-	3,072	65.4	Details
Titan Text Embeddings V2	Amazon Bedrock	$0.020	-	1,024	62.8	Details
BGE-M3 (self-hosted)	Self-Hosted	$0.001	$0.001	1,024	66.5	Details

Prices as of June 2026. Green = cheapest tier, red = most expensive tier. MTEB Retrieval average scores where publicly available.

How Embedding Pricing Works

Embedding models convert text into numeric vectors. Providers charge for the tokens you send in (input-only pricing) - there are no output tokens to pay for. This makes embeddings 10-100x cheaper than chat completions at similar token volumes.

The formula is simple: cost = (tokens / 1,000,000) x rate. Where engineers get surprised is the Batch API discount (50% off for OpenAI, 33% off for Voyage) which can halve your indexing bill overnight.

The other surprise is storage cost. Generating 1B embeddings with OpenAI small costs $20 once. Storing those vectors in a managed vector DB can cost $200+/month forever. For long-lived corpora, the recurring storage bill dwarfs the one-time generation cost within a few months.

Chunk overlap also inflates token counts 10-25% over the raw text size. A 1GB corpus is typically 750M raw tokens but 830-940M billed tokens after overlapping chunking. Plan for it.

Cost at Common Scales

Scale	OAI small	Voyage 4	Self-hosted
10M tokens	$0.20	$0.60	$0.01
100M tokens	$2	$6	$0.10
1B tokens	$20	$60	$1
10B tokens	$200	$600	$10

Self-hosted cost is estimated for BGE-M3 on a spot A100. Excludes DevOps overhead. Break-even analysis

Go Deeper

Full Calculator

Multi-provider calculator with storage cost, batch toggle, and Year 1 total

Compare All Models

Side-by-side table with MTEB scores, context windows, and free tiers

Self-Hosted Break-Even

Interactive calculator: at what volume does BGE-M3 beat the APIs?

Vector DB Storage

Pinecone vs pgvector vs Qdrant at 1M / 10M / 100M vector scale

Matryoshka Dimensions

Truncate embedding dimensions to cut storage cost - the API token bill stays flat

Frequently Asked Questions

How much do OpenAI embeddings cost?

OpenAI text-embedding-3-small costs $0.02 per million tokens (standard) or $0.01 per million via the Batch API. text-embedding-3-large costs $0.13/M standard, $0.065/M batch. For 50 million tokens, that is $1.00 on small standard or $0.50 on batch. The legacy ada-002 model costs $0.10/M with no batch discount.

Which embedding model is cheapest?

Self-hosted BGE-M3 on a spot A100 GPU costs roughly $0.001 per million tokens - cheapest overall, but requires infrastructure management. Among commercial APIs, OpenAI text-embedding-3-small batch at $0.01/M is the cheapest option. voyage-4-lite at $0.02/M standard and OpenAI small standard at $0.02/M are the next cheapest commercial options.

Does OpenAI charge for output tokens on embeddings?

No. Embedding models only charge for input tokens - the text you send in. There are no output tokens because embeddings return a numeric vector, not generated text. This is why embeddings are 10 to 100 times cheaper per token than chat completions.

What does it cost to store embeddings?

Storage cost follows the formula: vectors x dimensions x 4 bytes per vector. For 1 million 1,536-dimension vectors, that is about 5.7 GB. On Pinecone serverless, expect roughly $1.90 per month for storage alone. For large corpora, the monthly storage bill often exceeds the one-time embedding generation cost within a few months. Use pgvector on an existing Postgres database for near-zero marginal storage cost.

How do I reduce my embedding bill?

The highest-impact techniques are: (1) Use the Batch API - 50% off OpenAI, 33% off Voyage for bulk indexing. (2) Pick the smaller model - OpenAI small vs large is 6.5x cheaper for roughly 3% quality reduction. (3) Use Matryoshka dimensions - reduce from 3,072 to 1,536 dims on text-embedding-3-large to halve storage cost. (4) Cache query embeddings - many queries repeat. (5) De-duplicate before embedding. See the full optimization guide.

Is it cheaper to self-host embedding models?

Yes, above roughly 15 million tokens per month. A spot A100 GPU at $1.50 per hour can process about 8,000 tokens per second with BGE-M3, costing roughly $0.001 per million tokens. Self-hosting has fixed overhead costs though - a full-time GPU month costs $1,080 at spot rates even with zero usage. The break-even point depends on your specific GPU rate and utilization.

What is a word embedding and how is it different from document embedding?

Word embeddings (Word2Vec, GloVe) represent individual words as vectors. Modern embedding models from OpenAI, Cohere, and Voyage embed entire sentences, paragraphs, or documents into a single vector capturing semantic meaning. The pricing question for AI engineers is almost always about document or chunk embeddings for RAG, not word embeddings - which are typically open-source and free to run.

How does Azure OpenAI embedding pricing differ from OpenAI direct?

Azure OpenAI matches OpenAI direct per-token rates for embedding models as of June 2026 (text-embedding-3-small $0.02/M, 3-large $0.13/M). The key differences: Azure adds Provisioned Throughput Units (PTU) for reserved capacity at enterprise scale, the Batch API discount is not available on Azure, and pricing may vary by Azure region.

Disclaimer: This site is independent and not affiliated with OpenAI, Cohere, Voyage AI, Anthropic, Google, Amazon Web Services, or any other provider listed. Pricing is compiled from public sources and verified monthly. Always confirm current pricing on the provider's own site before committing to architecture decisions.