Real-World RAG Embedding Cost Examples: 5 Production Scenarios (May 2026)

Q: How much does a RAG application cost?

RAG application cost depends heavily on corpus size and query volume. A small knowledge base (10k documents, 100 queries/day) can cost under $2/month on OpenAI small + pgvector. A large semantic search engine (1M documents, 10k queries/day) costs $150-500/month on commercial APIs depending on provider choice. The LLM generation cost for answers is usually separate and larger than the embedding cost.

Q: Is embedding cost or LLM generation cost higher in RAG?

LLM generation cost is almost always higher for production RAG applications. Embedding a query costs about $0.0000006 (0.6 millionths of a dollar) on OpenAI small at 30 tokens. Generating a 500-token answer with Claude 3.5 Sonnet costs about $0.0015. The LLM is roughly 2,500x more expensive per request. Embedding optimization matters most for large indexing workloads, not query-time costs.

Q: What is the cheapest way to build a RAG application?

For a small knowledge base: OpenAI text-embedding-3-small for embeddings (cheapest commercial API), pgvector on an existing Postgres database (near-zero storage cost), and a local or smaller LLM for generation. Total embedding cost for a 100k-document knowledge base with 100 daily queries is under $2/month.

Pricing pages tell you $0.02 per million tokens. Here is what that means for five actual RAG applications, with full cost breakdowns: one-time embedding, monthly query cost, monthly storage, and Year 1 total.

Verified May 2026

Customer Support Bot

50k historical tickets, 2k new queries/day, citation-required answers

Full breakdown

Corpus to index

50k tickets x 500 tokens = 25M tokens to embed

Monthly queries

2k queries/day x 30 tokens = 1.8M tokens/month

Storage

50k vectors x 1536 dims x 4 bytes = 307 MB

Stack	Embed (once)	Query/mo	Storage/mo	Year 1
OpenAI small + pgvector(cheapest)	$0.50	$0.04	$0.0070	$1.02
Voyage 3.5 + pgvector	$1.50	$0.11	$0.0070	$2.88
OpenAI small + Pinecone SL	$0.50	$0.04	$0.10	$2.13

Try in full calculator

Semantic Search Engine

1M documents, 10k queries/day, fast response required

Corpus to index

1M docs x 500 tokens = 500M tokens to embed

Monthly queries

10k queries/day x 30 tokens = 9M tokens/month

Storage

1M vectors x 1536 dims = 5.72 GB

Stack	Embed (once)	Query/mo	Storage/mo	Year 1
OpenAI small + Pinecone pod	$10	$0.18	$70	$852
Voyage 3.5 + Qdrant Cloud	$30	$0.54	$0.69	$45
OAI small batch + pgvector(cheapest)	$5.00	$0.18	$0.13	$8.72

Try in full calculator

Internal Knowledge Base

10k Notion pages, 500 employees, general Q&A

Corpus to index

10k pages x 400 tokens = 4M tokens to embed

Monthly queries

500 queries/day x 25 tokens = 375k tokens/month

Storage

10k vectors x 1536 dims = 57 MB

Stack	Embed (once)	Query/mo	Storage/mo	Year 1
OpenAI small + pgvector(cheapest)	$0.08	$0.0075	$0.0010	$0.18
Voyage 3.5 + pgvector	$0.24	$0.02	$0.0010	$0.52

Try in full calculator

E-Commerce Product Search

500k SKUs, 100k queries/day, multilingual

Corpus to index

500k SKUs x 200 tokens = 100M tokens to embed

Monthly queries

100k queries/day x 20 tokens = 60M tokens/month

Storage

500k vectors x 1024 dims = 1.91 GB

Stack	Embed (once)	Query/mo	Storage/mo	Year 1
Cohere v4 + Qdrant Cloud	$10	$6.00	$0.23	$85
OAI small + pgvector(cheapest)	$2.00	$1.20	$0.04	$17

Try in full calculator

Legal Document RAG

100k contracts, accuracy-critical retrieval

Corpus to index

100k docs x 3000 tokens (long docs) = 300M tokens to embed

Monthly queries

500 queries/day x 50 tokens = 750k tokens/month

Storage

100k vectors x 1536 dims = 572 MB

Stack	Embed (once)	Query/mo	Storage/mo	Year 1
Voyage 3-large + Qdrant Cloud	$54	$0.14	$0.07	$56
Voyage law-2 + Qdrant Cloud	$36	$0.09	$0.07	$38
OAI large batch + Qdrant(cheapest)	$20	$0.10	$0.07	$21

Try in full calculator

Frequently Asked Questions

How much does a RAG application cost?

A small knowledge base (10k documents, 100 queries/day) can cost under $2/month on OpenAI small + pgvector. A large semantic search engine (1M documents, 10k queries/day) costs $150-500/month on commercial APIs. LLM generation cost for answers is usually larger than the embedding cost.

Is embedding cost or LLM generation cost higher in RAG?

LLM generation is almost always higher. Embedding a 30-token query costs about $0.0000006 on OpenAI small. Generating a 500-token answer with Claude 3.5 Sonnet costs about $0.0015 - roughly 2,500x more per request. Embedding optimization matters most for large indexing workloads.

What is the cheapest way to build a RAG application?

OpenAI text-embedding-3-small for embeddings + pgvector on an existing Postgres database. Total embedding cost for a 100k-document knowledge base with 100 daily queries is under $2/month.

How often do I need to re-embed documents?

Only when documents change or when you upgrade embedding models. Static document sets rarely need re-embedding. Frequently updated content should be re-embedded on update. The re-embedding cost is usually small relative to initial indexing.

Does the vector database or the embedding API cost more?

For large corpora, vector database storage usually exceeds embedding generation cost within a few months. Generating 1B embeddings with OpenAI small costs $20 once. Storing those in Pinecone serverless costs about $63/month. Use pgvector or self-hosted Qdrant to minimize storage cost.

OpenAI pricing deep-dive

The model behind 4 of the 5 cheapest scenarios above

Full calculator

Enter your own numbers for any scenario

Optimization tips

Reduce any scenario cost by 30-80%

Disclaimer: Scenario costs use public pricing as of May 2026. Actual costs depend on token counting, chunk strategy, and specific infrastructure choices. LLM generation costs (for answers) are not included - those are typically the largest component.