Independent resource. Not affiliated with any provider. Always verify pricing on provider sites.
$embeddingcost

Real-World RAG Embedding Cost Examples: 5 Production Scenarios (April 2026)

Pricing pages tell you $0.02 per million tokens. Here is what that means for five actual RAG applications, with full cost breakdowns: one-time embedding, monthly query cost, monthly storage, and Year 1 total.

Verified April 2026

Customer Support Bot

50k historical tickets, 2k new queries/day, citation-required answers

Full breakdown
Corpus to index
50k tickets x 500 tokens = 25M tokens to embed
Monthly queries
2k queries/day x 30 tokens = 1.8M tokens/month
Storage
50k vectors x 1536 dims x 4 bytes = 307 MB
StackEmbed (once)Query/moStorage/moYear 1
OpenAI small + pgvector(cheapest)$0.50$0.04$0.0070$1.02
Voyage 3.5 + pgvector$1.50$0.11$0.0070$2.88
OpenAI small + Pinecone SL$0.50$0.04$0.10$2.13

Semantic Search Engine

1M documents, 10k queries/day, fast response required

Corpus to index
1M docs x 500 tokens = 500M tokens to embed
Monthly queries
10k queries/day x 30 tokens = 9M tokens/month
Storage
1M vectors x 1536 dims = 5.72 GB
StackEmbed (once)Query/moStorage/moYear 1
OpenAI small + Pinecone pod$10$0.18$70$852
Voyage 3.5 + Qdrant Cloud$30$0.54$0.69$45
OAI small batch + pgvector(cheapest)$5.00$0.18$0.13$8.72

Internal Knowledge Base

10k Notion pages, 500 employees, general Q&A

Corpus to index
10k pages x 400 tokens = 4M tokens to embed
Monthly queries
500 queries/day x 25 tokens = 375k tokens/month
Storage
10k vectors x 1536 dims = 57 MB
StackEmbed (once)Query/moStorage/moYear 1
OpenAI small + pgvector(cheapest)$0.08$0.0075$0.0010$0.18
Voyage 3.5 + pgvector$0.24$0.02$0.0010$0.52

E-Commerce Product Search

500k SKUs, 100k queries/day, multilingual

Corpus to index
500k SKUs x 200 tokens = 100M tokens to embed
Monthly queries
100k queries/day x 20 tokens = 60M tokens/month
Storage
500k vectors x 1024 dims = 1.91 GB
StackEmbed (once)Query/moStorage/moYear 1
Cohere v4 + Qdrant Cloud$10$6.00$0.23$85
OAI small + pgvector(cheapest)$2.00$1.20$0.04$17

Legal Document RAG

100k contracts, accuracy-critical retrieval

Corpus to index
100k docs x 3000 tokens (long docs) = 300M tokens to embed
Monthly queries
500 queries/day x 50 tokens = 750k tokens/month
Storage
100k vectors x 1536 dims = 572 MB
StackEmbed (once)Query/moStorage/moYear 1
Voyage 3-large + Qdrant Cloud$54$0.14$0.07$56
Voyage law-2 + Qdrant Cloud$36$0.09$0.07$38
OAI large batch + Qdrant(cheapest)$20$0.10$0.07$21

Frequently Asked Questions

How much does a RAG application cost?
A small knowledge base (10k documents, 100 queries/day) can cost under $2/month on OpenAI small + pgvector. A large semantic search engine (1M documents, 10k queries/day) costs $150-500/month on commercial APIs. LLM generation cost for answers is usually larger than the embedding cost.
Is embedding cost or LLM generation cost higher in RAG?
LLM generation is almost always higher. Embedding a 30-token query costs about $0.0000006 on OpenAI small. Generating a 500-token answer with Claude 3.5 Sonnet costs about $0.0015 - roughly 2,500x more per request. Embedding optimization matters most for large indexing workloads.
What is the cheapest way to build a RAG application?
OpenAI text-embedding-3-small for embeddings + pgvector on an existing Postgres database. Total embedding cost for a 100k-document knowledge base with 100 daily queries is under $2/month.
How often do I need to re-embed documents?
Only when documents change or when you upgrade embedding models. Static document sets rarely need re-embedding. Frequently updated content should be re-embedded on update. The re-embedding cost is usually small relative to initial indexing.
Does the vector database or the embedding API cost more?
For large corpora, vector database storage usually exceeds embedding generation cost within a few months. Generating 1B embeddings with OpenAI small costs $20 once. Storing those in Pinecone serverless costs about $63/month. Use pgvector or self-hosted Qdrant to minimize storage cost.
Full calculator
Enter your own numbers for any scenario
Optimization tips
Reduce any scenario cost by 30-80%
Disclaimer: Scenario costs use public pricing as of April 2026. Actual costs depend on token counting, chunk strategy, and specific infrastructure choices. LLM generation costs (for answers) are not included - those are typically the largest component.