Methodology
How EmbeddingCost.com sources and verifies AI embedding pricing. Provider URLs, calculator assumptions, MTEB benchmark provenance, refresh cadence, and the corrections process. The single-source verification date is currently set to .
1. Provider sources
Every per-million-token rate on this site is traceable to the provider's own public pricing page. Vector database rates are taken from the operator's published pricing. Where a provider has multiple surfaces (Google Gemini API vs Vertex AI; Amazon Bedrock vs SageMaker JumpStart) the page that surfaces the rate is noted on the relevant per-provider page.
| Provider | Source URL | Refresh |
|---|---|---|
| OpenAI | openai.com/api/pricing Per-million-token rates for text-embedding-3-small, 3-large, and ada-002. Batch API discount is documented separately in OpenAI's batch API page; the 50 percent rate is taken directly from the public Batch tier. | Monthly |
| Anthropic | www.anthropic.com/pricing Anthropic does not currently offer an embedding model. The Claude API supports text generation and tool use; embedding workloads route to OpenAI, Cohere, Voyage, Google, AWS Bedrock, or a self-hosted model. See claudeapipricing.com for the chat side. | Reference only |
| Cohere | cohere.com/pricing embed-v4 text and image rates. Note: Cohere does not currently publish a batch-tier discount for embeddings; the standard rate is the rate. | Monthly |
| Voyage AI | docs.voyageai.com/docs/pricing voyage-3-lite, voyage-3.5, voyage-3-large, and the domain-specific models (voyage-code-3, voyage-law-2, voyage-finance-2). 33 percent batch discount and the 200M lifetime free tier are taken from Voyage's documentation. | Monthly |
| ai.google.dev/gemini-api/docs/pricing gemini-embedding-001 (GA) and gemini-embedding-2-preview rates. Vertex AI surface is cross-checked against cloud.google.com/vertex-ai/generative-ai/pricing; rates currently match. | Monthly | |
| Amazon Bedrock | aws.amazon.com/bedrock/pricing/ Amazon Titan Text Embeddings V2 (and Cohere / Voyage models surfaced through Bedrock). Rates quoted are US East 1; other AWS regions can run 10 to 20 percent higher and are explicitly flagged on the /aws-bedrock page. | Monthly (US East 1) |
| Pinecone | www.pinecone.io/pricing/ Serverless storage rate ($0.33/GB/month), read/write unit rates, and pod-based pricing (p1.x1 floor at $70/month). Cold-start latency notes are taken from Pinecone's serverless documentation. | Quarterly |
| Qdrant Cloud | cloud.qdrant.io/ Storage and RAM per-GB rates. Free tier (1M vectors) verified from the Qdrant pricing calculator. | Quarterly |
| Weaviate Cloud | weaviate.io/pricing $0.095/GB/month managed storage rate. Self-hosted Weaviate is excluded from managed rates. | Quarterly |
| Zilliz Cloud / Milvus | zilliz.com/pricing Per-GB storage rate, billion-scale tier notes. Self-hosted Milvus is excluded from managed rates. | Quarterly |
| pgvector | github.com/pgvector/pgvector Open-source Postgres extension. Cost reference assumes an existing Postgres deployment; raw storage at $0.023/GB/month is taken from typical managed-Postgres rates (AWS RDS, GCP Cloud SQL, Supabase Pro), not from pgvector itself. | On change |
2. What is in scope
- +Per-million-token published rates for embedding models on standard public pricing tiers.
- +Batch API discounts where the provider publishes them (OpenAI 50 percent, Voyage 33 percent).
- +Free tier limits where the provider publishes them (Cohere rate limits, Voyage 200M lifetime tokens, Google AI Studio daily quotas).
- +MTEB Retrieval average benchmark scores taken from the public MTEB leaderboard at verification time.
- +Vector database managed-storage rates for Pinecone, Qdrant Cloud, Weaviate Cloud, Zilliz Cloud, and a pgvector cost reference.
- +Self-hosted A100 break-even math at $1.50 per GPU hour spot pricing.
3. What is out of scope
- -Enterprise-negotiated pricing. Volume commitments, contract minimums, and custom MSAs are deliberately excluded.
- -Microsoft Azure OpenAI regional pricing surcharges. Azure pricing currently matches OpenAI on standard rates; PTU and reserved-capacity commitments are not in scope.
- -AWS Bedrock cross-region pricing. The /aws-bedrock page uses US East 1; other regions may be 10 to 20 percent higher.
- -Fine-tuning fees, provisioned throughput unit pricing, dedicated capacity commitments.
- -Vector database cold-start latency premiums beyond the noted Pinecone serverless caveat.
- -LLM generation cost. Embedding is one half of a RAG pipeline; chat generation cost is covered on contextcost.com, claudeapipricing.com, and geminipricing.com.
4. Calculator assumptions
OpenAI rates use the tiktoken cl100k_base tokenizer for token estimates. Other providers use their published tokenizers where available (Voyage publishes a tokenizer; Cohere does not currently). For mixed-provider comparison, calculator inputs assume a 1.0 to 1.05 tokens-per-input-word multiplier consistent with English-language prose.
The default in the calculator and worked examples is 25 percent overlap, which is the production-standard for RAG. This inflates billed tokens by roughly 25 percent over raw corpus tokens. Higher-precision pipelines use 10 percent overlap and pay less; aggressive overlap up to 40 percent is rare and the calculator does not surface it by default.
vectors x dimensions x 4 bytes. For 1M 1536-dimension vectors that is 5.72 GB before HNSW index overhead.
vectors x dimensions / 8 bytes. Only Amazon Titan V2 currently supports binary quantization in a production-grade API. Binary storage cuts raw bytes by 32x with a 5 to 10 percent retrieval accuracy reduction.
HNSW indexes add roughly 20 to 40 percent on top of raw vector storage. Calculator outputs show raw vector storage; pricing tables for managed vector DBs include 30 percent overhead as a planning assumption.
BGE-M3 on a single A100 GPU is estimated at 8,000 tokens per second. Spot pricing at $1.50 per GPU hour gives roughly $0.052 per million tokens at full utilisation. Real utilisation depends on batching, queue depth, and ops overhead.
One-time indexing cost plus 12 months of (monthly query cost plus monthly storage cost). Re-indexing passes are added separately on the /scenarios pages where they apply.
5. MTEB benchmark provenance
The MTEB (Massive Text Embedding Benchmark) Retrieval average is the most cited cross-provider quality metric on this site. Numbers are taken from the public MTEB leaderboard hosted on Hugging Face at huggingface.co/spaces/mteb/leaderboard at the time of each verification pass. The leaderboard refreshes continuously as new models are submitted; this site captures a quarterly snapshot.
Where a provider has not submitted to MTEB Retrieval (Cohere embed-v4 image embeddings, voyage-code-3 when used outside code corpora) the relevant column shows a dash rather than an interpolated number. Vendor-published comparisons ("our model beats X by Y points") are explicitly out of scope; only the public leaderboard counts.
MTEB itself measures retrieval quality on a curated benchmark suite (BEIR plus extensions). Real-world retrieval quality on your domain corpus can diverge from MTEB averages by several points. The numbers published here are a coarse ordering, not a substitute for domain-specific evaluation.
6. Refresh cadence
Provider pricing pages are re-verified on the first business week of each month. The pass walks each source URL in section 1, captures the per-million-token rate, batch discount where applicable, and free-tier terms. Differences against the previous month are reconciled in the src/data/pricing.ts source-of-truth file; both the human-readable label and the ISO date constants in src/lib/schema.ts are bumped in the same commit.
MTEB benchmark scores are refreshed quarterly because the leaderboard moves more slowly than vendor pricing. Vector database storage rates are refreshed quarterly for the same reason.
The verification date is held in one constant (LAST_VERIFIED_DATE) imported by every page. Footer text, schema dateModified, page H1 timestamps, and the calculator legend all read from that single source so a cosmetic refresh of one element without an underlying verification pass is structurally prevented.
7. Corrections process
Send the provider pricing page URL and the specific disagreement to [email protected]. Substantiated corrections are re-verified against the provider page and shipped within five business days. Changes that shift a price tier, alter a calculator coefficient, or change a MTEB score get a dated entry on the page where the figure is shown so the change history is visible to anyone using the site for an architecture decision.
Disagreements about scope ("you should cover X provider", "you should expand on Y use case") are queued and surface in the next site-wide quarterly review. Coverage decisions are editorial and not commercial.
8. Why no affiliate links
Outbound vendor URLs in the data layer (Pinecone, Qdrant, Weaviate, Zilliz, the embedding API providers) are plain unaffiliated URLs. There is no UTM tag, no referral code, no revenue share. The site is run as an editorial reference, not a lead-generation funnel. Where a vendor has an obvious advantage on a comparison, the comparison says so; where they are functionally equivalent, the comparison says that too. The lack of affiliate revenue is what makes both calls cheap to make.
9. Limitations
Pricing changes between verification windows. Vendors sometimes adjust rates with minimal notice. The monthly cadence here is a planning assumption, not a real-time price feed. Always confirm current pricing on the provider's own pricing page before committing to architecture decisions on a corpus large enough that a 10 to 20 percent rate move matters.
The calculator outputs are estimates based on the assumptions in section 4. Real production cost depends on rate-limit overhead, retry overhead, tokenizer differences on your specific corpus, chunk strategy, regional surcharges, and any negotiated enterprise terms. Treat calculator output as the order of magnitude, not the invoice.
MTEB Retrieval average is a coarse cross-provider quality ordering. It does not capture multilingual quality (which is where Cohere has an edge), code retrieval quality (voyage-code-3), or domain-specific accuracy. Run an evaluation on a sample of your real corpus before committing to a provider on accuracy grounds.