Independent resource. Not affiliated with any provider. Always verify pricing on provider sites.
$embeddingcost

Methodology

How EmbeddingCost.com sources and verifies AI embedding pricing. Provider URLs, calculator assumptions, MTEB benchmark provenance, refresh cadence, and the corrections process. The single-source verification date is currently set to .

Prices verified May 2026

1. Provider sources

Every per-million-token rate on this site is traceable to the provider's own public pricing page. Vector database rates are taken from the operator's published pricing. Where a provider has multiple surfaces (Google Gemini API vs Vertex AI; Amazon Bedrock vs SageMaker JumpStart) the page that surfaces the rate is noted on the relevant per-provider page.

ProviderSource URLRefresh
OpenAIopenai.com/api/pricing
Per-million-token rates for text-embedding-3-small, 3-large, and ada-002. Batch API discount is documented separately in OpenAI's batch API page; the 50 percent rate is taken directly from the public Batch tier.
Monthly
Anthropicwww.anthropic.com/pricing
Anthropic does not currently offer an embedding model. The Claude API supports text generation and tool use; embedding workloads route to OpenAI, Cohere, Voyage, Google, AWS Bedrock, or a self-hosted model. See claudeapipricing.com for the chat side.
Reference only
Coherecohere.com/pricing
embed-v4 text and image rates. Note: Cohere does not currently publish a batch-tier discount for embeddings; the standard rate is the rate.
Monthly
Voyage AIdocs.voyageai.com/docs/pricing
voyage-3-lite, voyage-3.5, voyage-3-large, and the domain-specific models (voyage-code-3, voyage-law-2, voyage-finance-2). 33 percent batch discount and the 200M lifetime free tier are taken from Voyage's documentation.
Monthly
Googleai.google.dev/gemini-api/docs/pricing
gemini-embedding-001 (GA) and gemini-embedding-2-preview rates. Vertex AI surface is cross-checked against cloud.google.com/vertex-ai/generative-ai/pricing; rates currently match.
Monthly
Amazon Bedrockaws.amazon.com/bedrock/pricing/
Amazon Titan Text Embeddings V2 (and Cohere / Voyage models surfaced through Bedrock). Rates quoted are US East 1; other AWS regions can run 10 to 20 percent higher and are explicitly flagged on the /aws-bedrock page.
Monthly (US East 1)
Pineconewww.pinecone.io/pricing/
Serverless storage rate ($0.33/GB/month), read/write unit rates, and pod-based pricing (p1.x1 floor at $70/month). Cold-start latency notes are taken from Pinecone's serverless documentation.
Quarterly
Qdrant Cloudcloud.qdrant.io/
Storage and RAM per-GB rates. Free tier (1M vectors) verified from the Qdrant pricing calculator.
Quarterly
Weaviate Cloudweaviate.io/pricing
$0.095/GB/month managed storage rate. Self-hosted Weaviate is excluded from managed rates.
Quarterly
Zilliz Cloud / Milvuszilliz.com/pricing
Per-GB storage rate, billion-scale tier notes. Self-hosted Milvus is excluded from managed rates.
Quarterly
pgvectorgithub.com/pgvector/pgvector
Open-source Postgres extension. Cost reference assumes an existing Postgres deployment; raw storage at $0.023/GB/month is taken from typical managed-Postgres rates (AWS RDS, GCP Cloud SQL, Supabase Pro), not from pgvector itself.
On change

2. What is in scope

3. What is out of scope

4. Calculator assumptions

Tokenizer

OpenAI rates use the tiktoken cl100k_base tokenizer for token estimates. Other providers use their published tokenizers where available (Voyage publishes a tokenizer; Cohere does not currently). For mixed-provider comparison, calculator inputs assume a 1.0 to 1.05 tokens-per-input-word multiplier consistent with English-language prose.

Chunk overlap

The default in the calculator and worked examples is 25 percent overlap, which is the production-standard for RAG. This inflates billed tokens by roughly 25 percent over raw corpus tokens. Higher-precision pipelines use 10 percent overlap and pay less; aggressive overlap up to 40 percent is rare and the calculator does not surface it by default.

Storage formula (float32)

vectors x dimensions x 4 bytes. For 1M 1536-dimension vectors that is 5.72 GB before HNSW index overhead.

Storage formula (binary)

vectors x dimensions / 8 bytes. Only Amazon Titan V2 currently supports binary quantization in a production-grade API. Binary storage cuts raw bytes by 32x with a 5 to 10 percent retrieval accuracy reduction.

Index overhead

HNSW indexes add roughly 20 to 40 percent on top of raw vector storage. Calculator outputs show raw vector storage; pricing tables for managed vector DBs include 30 percent overhead as a planning assumption.

Self-hosted throughput

BGE-M3 on a single A100 GPU is estimated at 8,000 tokens per second. Spot pricing at $1.50 per GPU hour gives roughly $0.052 per million tokens at full utilisation. Real utilisation depends on batching, queue depth, and ops overhead.

Year 1 total

One-time indexing cost plus 12 months of (monthly query cost plus monthly storage cost). Re-indexing passes are added separately on the /scenarios pages where they apply.

5. MTEB benchmark provenance

The MTEB (Massive Text Embedding Benchmark) Retrieval average is the most cited cross-provider quality metric on this site. Numbers are taken from the public MTEB leaderboard hosted on Hugging Face at huggingface.co/spaces/mteb/leaderboard at the time of each verification pass. The leaderboard refreshes continuously as new models are submitted; this site captures a quarterly snapshot.

Where a provider has not submitted to MTEB Retrieval (Cohere embed-v4 image embeddings, voyage-code-3 when used outside code corpora) the relevant column shows a dash rather than an interpolated number. Vendor-published comparisons ("our model beats X by Y points") are explicitly out of scope; only the public leaderboard counts.

MTEB itself measures retrieval quality on a curated benchmark suite (BEIR plus extensions). Real-world retrieval quality on your domain corpus can diverge from MTEB averages by several points. The numbers published here are a coarse ordering, not a substitute for domain-specific evaluation.

6. Refresh cadence

Provider pricing pages are re-verified on the first business week of each month. The pass walks each source URL in section 1, captures the per-million-token rate, batch discount where applicable, and free-tier terms. Differences against the previous month are reconciled in the src/data/pricing.ts source-of-truth file; both the human-readable label and the ISO date constants in src/lib/schema.ts are bumped in the same commit.

MTEB benchmark scores are refreshed quarterly because the leaderboard moves more slowly than vendor pricing. Vector database storage rates are refreshed quarterly for the same reason.

The verification date is held in one constant (LAST_VERIFIED_DATE) imported by every page. Footer text, schema dateModified, page H1 timestamps, and the calculator legend all read from that single source so a cosmetic refresh of one element without an underlying verification pass is structurally prevented.

7. Corrections process

Send the provider pricing page URL and the specific disagreement to [email protected]. Substantiated corrections are re-verified against the provider page and shipped within five business days. Changes that shift a price tier, alter a calculator coefficient, or change a MTEB score get a dated entry on the page where the figure is shown so the change history is visible to anyone using the site for an architecture decision.

Disagreements about scope ("you should cover X provider", "you should expand on Y use case") are queued and surface in the next site-wide quarterly review. Coverage decisions are editorial and not commercial.

8. Why no affiliate links

Outbound vendor URLs in the data layer (Pinecone, Qdrant, Weaviate, Zilliz, the embedding API providers) are plain unaffiliated URLs. There is no UTM tag, no referral code, no revenue share. The site is run as an editorial reference, not a lead-generation funnel. Where a vendor has an obvious advantage on a comparison, the comparison says so; where they are functionally equivalent, the comparison says that too. The lack of affiliate revenue is what makes both calls cheap to make.

9. Limitations

Pricing changes between verification windows. Vendors sometimes adjust rates with minimal notice. The monthly cadence here is a planning assumption, not a real-time price feed. Always confirm current pricing on the provider's own pricing page before committing to architecture decisions on a corpus large enough that a 10 to 20 percent rate move matters.

The calculator outputs are estimates based on the assumptions in section 4. Real production cost depends on rate-limit overhead, retry overhead, tokenizer differences on your specific corpus, chunk strategy, regional surcharges, and any negotiated enterprise terms. Treat calculator output as the order of magnitude, not the invoice.

MTEB Retrieval average is a coarse cross-provider quality ordering. It does not capture multilingual quality (which is where Cohere has an edge), code retrieval quality (voyage-code-3), or domain-specific accuracy. Run an evaluation on a sample of your real corpus before committing to a provider on accuracy grounds.

OpenAI pricing
Per-model rates, Batch API math, Azure differences
Compare all models
Side-by-side, MTEB-scored cross-provider
Full calculator
Apply the methodology to your own workload
Disclaimer: EmbeddingCost.com is independent. Pricing is compiled from public sources and verified monthly per the cadence in section 6. Always confirm current pricing on the provider's own pricing page before committing to architecture decisions.

Updated May 2026