About EmbeddingCost.com
An independent reference for what AI embedding models actually cost in May 2026. Cross-provider calculator, per-model deep-dives, vector database storage analysis, optimization techniques, and worked RAG scenarios. No vendor relationships, no affiliate links, no quote forms.
Why this site exists
Provider pricing pages list per-million-token rates in isolation. They do not show batch API math, they do not surface the Matryoshka dimension storage knock-on, and they do not compare across the OpenAI, Cohere, Voyage, Google and Amazon Bedrock landscape. AI engineers sizing a retrieval-augmented generation build need one place to do the cross-provider math and see the storage tail.
The other constraint is freshness without theatre. Vendor pages change their pricing pages quietly. A cost-reference site that says "updated April 2026" in three different places, all of them hand-edited, drifts within weeks. This site solves that with a single verification-date constant imported by every page; when the verification pass happens, one file changes and the entire site reads the new label, including the JSON-LD schema dateModified fields.
The site also exists to take a position on the storage tail. Generating one billion embeddings with OpenAI text-embedding-3-small costs $20 once. Storing those vectors in a managed vector database can cost $200 per month forever. For long-lived corpora, the recurring storage bill dwarfs the one-time generation cost within a few months. Most pricing references stop at the per-token rate; this one treats storage as a peer concern.
Who builds this
Engineer-operator who builds and runs AI cost-reference sites and productised compliance tooling. Day job is autonomous AI development for SMEs (Digital Signet); this site is an editorial property, not a client deliverable. Contact at [email protected] or LinkedIn.
EmbeddingCost.com is published by Digital Signet, an independent UK consultancy that builds autonomous AI development systems. Digital Signet also operates sister AI-pricing references including claudeapipricing.com, contextcost.com, and geminipricing.com. Editorial is set by Oliver; numbers are validated by the practitioner network Digital Signet maintains.
Editorial position
EmbeddingCost.com is not a reseller, not a managed-services consultancy, not a vendor. It does not accept paid placements, sponsored sub-pages, or directory inclusion fees. Vendor URLs in our data (Pinecone, Qdrant, Weaviate, Zilliz, the embedding API providers) are plain unaffiliated URLs; clicking through does not benefit this site financially.
The site takes no editorial position on which model is best. The data is the position. Where one provider has a meaningful advantage (Voyage on MTEB Retrieval accuracy, OpenAI on per-token cost, Cohere on multilingual quality), the comparison surfaces it; where they are functionally interchangeable, the comparison says so. The intent is that an AI engineer landing here cold can reach a defensible architecture decision in under fifteen minutes without an email gate.
What this site covers
Cross-provider cheat sheet, scale tables and decision guide on the homepage.
text-embedding-3-small, 3-large, ada-002, Azure differences, Batch API math.
$0.10/M text rate, $0.47/M images, the 512-token context window caveat.
voyage-3.5, voyage-3-large, voyage-3-lite, batch math, 200M free tokens.
gemini-embedding-001 and 2-preview, MRL dimension reduction options.
Titan V2 cost on Bedrock, binary embeddings, OpenSearch Serverless math.
BGE-M3 break-even, GPU rate ladder, real $85k/year case study.
Price, dimensions, MTEB score, context window, batch discount, free tier.
Tokens, batch toggle, dimension, storage, Year 1 total across all providers.
Pinecone, Weaviate, Qdrant, Zilliz, pgvector at 1M, 10M, 100M vector scale.
Eight cost-reduction techniques with concrete savings math, not vague percentages.
Five real production RAG scenarios with full Year 1 cost breakdowns.
Provider sources, calculator assumptions, refresh cadence, corrections process.
Editorial principles
Every per-million-token rate published on this site traces back to a provider's own public pricing page (OpenAI, Cohere, Voyage AI, Google, Amazon Bedrock). Where a number is contested or under-documented, both ends of the range are shown.
There are no sponsored slots, no premium positioning, no pay-to-rank. Provider order in tables is determined by price or relevance to the page, not by any commercial relationship.
Outbound links to vector database providers (Pinecone, Qdrant, Weaviate, Zilliz) and embedding APIs are plain unaffiliated URLs. This site is a reference, not a lead-generation funnel.
Pricing is re-verified against the provider's own pricing page on the first business week of each month. The last verified label currently reads May 2026.
The verification date is held in one constant (LAST_VERIFIED_DATE) imported by every page. Footer text, schema dateModified, and visible headings all read from that single source so cosmetic refreshes are not possible.
MTEB Retrieval average scores are taken from the public MTEB leaderboard at the time of the latest provider verification. They are not internal numbers and they are not promotional.
Methodology summary
The calculator uses tiktoken cl100k_base for OpenAI token estimates, and provider-published tokenizers where available for Cohere, Voyage, Google and Amazon. Chunk overlap defaults to 25 percent in the indexing math, which is the standard for production RAG. Storage formula is vectors x dimensions x 4 bytes for float32 and vectors x dimensions / 8 for binary quantization (Amazon Titan V2). Self-hosted BGE-M3 numbers assume A100 spot pricing at $1.50 per GPU hour and 8,000 tokens per second throughput.
Out of scope: enterprise-negotiated pricing, Microsoft Azure regional surcharges, AWS reserved-capacity commitments, fine-tuning charges. Full assumptions and source URLs are documented on the methodology page.
Contact and corrections
Spotted a stale price or a wrong calculator coefficient? Send the provider pricing page URL and the disagreement to [email protected]. Substantiated corrections are re-verified and shipped within five business days, and the verification date is bumped on the next site-wide pass. Editorial decisions that change a calculator coefficient or shift a price tier are noted on the methodology page with the date the change applied.
Disclosures
- -No affiliate links or referral fees on any vendor URL on this site.
- -No email-gated downloads, quote forms, or sales redirects.
- -Not affiliated with OpenAI, Anthropic, Cohere, Voyage AI, Google, Amazon Web Services, Pinecone, Qdrant, Weaviate, Zilliz, or any other listed vendor.
- -Calculator outputs are estimates; production pricing depends on enterprise agreements, regional surcharges, and provisioned-throughput commitments not covered here.