About EmbeddingCost.com

An independent reference for what AI embedding models actually cost in May 2026. Cross-provider calculator, per-model deep-dives, vector database storage analysis, optimization techniques, and worked RAG scenarios. No vendor relationships, no affiliate links, no quote forms.

Prices verified May 2026

Why this site exists

Provider pricing pages list per-million-token rates in isolation. They do not show batch API math, they do not surface the Matryoshka dimension storage knock-on, and they do not compare across the OpenAI, Cohere, Voyage, Google and Amazon Bedrock landscape. AI engineers sizing a retrieval-augmented generation build need one place to do the cross-provider math and see the storage tail.

The other constraint is freshness without theatre. Vendor pages change their pricing pages quietly. A cost-reference site that says "updated April 2026" in three different places, all of them hand-edited, drifts within weeks. This site solves that with a single verification-date constant imported by every page; when the verification pass happens, one file changes and the entire site reads the new label, including the JSON-LD schema dateModified fields.

The site also exists to take a position on the storage tail. Generating one billion embeddings with OpenAI text-embedding-3-small costs $20 once. Storing those vectors in a managed vector database can cost $200 per month forever. For long-lived corpora, the recurring storage bill dwarfs the one-time generation cost within a few months. Most pricing references stop at the per-token rate; this one treats storage as a peer concern.

Who builds this

Oliver Wakefield-Smith

Founder, Digital Signet

Engineer-operator who builds and runs AI cost-reference sites and productised compliance tooling. Day job is autonomous AI development for SMEs (Digital Signet); this site is an editorial property, not a client deliverable. Contact at [email protected] or LinkedIn.

EmbeddingCost.com is published by Digital Signet, an independent UK consultancy that builds autonomous AI development systems. Digital Signet also operates sister AI-pricing references including claudeapipricing.com, contextcost.com, and geminipricing.com. Editorial is set by Oliver; numbers are validated by the practitioner network Digital Signet maintains.

Editorial position

EmbeddingCost.com is not a reseller, not a managed-services consultancy, not a vendor. It does not accept paid placements, sponsored sub-pages, or directory inclusion fees. Vendor URLs in our data (Pinecone, Qdrant, Weaviate, Zilliz, the embedding API providers) are plain unaffiliated URLs; clicking through does not benefit this site financially.

The site takes no editorial position on which model is best. The data is the position. Where one provider has a meaningful advantage (Voyage on MTEB Retrieval accuracy, OpenAI on per-token cost, Cohere on multilingual quality), the comparison surfaces it; where they are functionally interchangeable, the comparison says so. The intent is that an AI engineer landing here cold can reach a defensible architecture decision in under fifteen minutes without an email gate.

What this site covers

Multi-provider pricing

Cross-provider cheat sheet, scale tables and decision guide on the homepage.

OpenAI per-model

text-embedding-3-small, 3-large, ada-002, Azure differences, Batch API math.

Cohere embed-v4

$0.10/M text rate, $0.47/M images, the 512-token context window caveat.

Voyage AI

voyage-3.5, voyage-3-large, voyage-3-lite, batch math, 200M free tokens.

Google Gemini

gemini-embedding-001 and 2-preview, MRL dimension reduction options.

Amazon Bedrock

Titan V2 cost on Bedrock, binary embeddings, OpenSearch Serverless math.

Self-hosted vs API

BGE-M3 break-even, GPU rate ladder, real $85k/year case study.

Side-by-side compare

Price, dimensions, MTEB score, context window, batch discount, free tier.

Multi-provider calculator

Tokens, batch toggle, dimension, storage, Year 1 total across all providers.

Vector DB storage

Pinecone, Weaviate, Qdrant, Zilliz, pgvector at 1M, 10M, 100M vector scale.

Optimization techniques

Eight cost-reduction techniques with concrete savings math, not vague percentages.

RAG scenarios

Five real production RAG scenarios with full Year 1 cost breakdowns.

Methodology

Provider sources, calculator assumptions, refresh cadence, corrections process.

Editorial principles

Source pattern

Every per-million-token rate published on this site traces back to a provider's own public pricing page (OpenAI, Cohere, Voyage AI, Google, Amazon Bedrock). Where a number is contested or under-documented, both ends of the range are shown.

No paid placements

There are no sponsored slots, no premium positioning, no pay-to-rank. Provider order in tables is determined by price or relevance to the page, not by any commercial relationship.

No affiliate parameters

Outbound links to vector database providers (Pinecone, Qdrant, Weaviate, Zilliz) and embedding APIs are plain unaffiliated URLs. This site is a reference, not a lead-generation funnel.

Monthly verification

Pricing is re-verified against the provider's own pricing page on the first business week of each month. The last verified label currently reads May 2026.

Single-source freshness

The verification date is held in one constant (LAST_VERIFIED_DATE) imported by every page. Footer text, schema dateModified, and visible headings all read from that single source so cosmetic refreshes are not possible.

MTEB benchmark provenance

MTEB Retrieval average scores are taken from the public MTEB leaderboard at the time of the latest provider verification. They are not internal numbers and they are not promotional.

Methodology summary

The calculator uses tiktoken cl100k_base for OpenAI token estimates, and provider-published tokenizers where available for Cohere, Voyage, Google and Amazon. Chunk overlap defaults to 25 percent in the indexing math, which is the standard for production RAG. Storage formula is vectors x dimensions x 4 bytes for float32 and vectors x dimensions / 8 for binary quantization (Amazon Titan V2). Self-hosted BGE-M3 numbers assume A100 spot pricing at $1.50 per GPU hour and 8,000 tokens per second throughput.

Out of scope: enterprise-negotiated pricing, Microsoft Azure regional surcharges, AWS reserved-capacity commitments, fine-tuning charges. Full assumptions and source URLs are documented on the methodology page.

Contact and corrections

Spotted a stale price or a wrong calculator coefficient? Send the provider pricing page URL and the disagreement to [email protected]. Substantiated corrections are re-verified and shipped within five business days, and the verification date is bumped on the next site-wide pass. Editorial decisions that change a calculator coefficient or shift a price tier are noted on the methodology page with the date the change applied.

Disclosures

-No affiliate links or referral fees on any vendor URL on this site.
-No email-gated downloads, quote forms, or sales redirects.
-Not affiliated with OpenAI, Anthropic, Cohere, Voyage AI, Google, Amazon Web Services, Pinecone, Qdrant, Weaviate, Zilliz, or any other listed vendor.
-Calculator outputs are estimates; production pricing depends on enterprise agreements, regional surcharges, and provisioned-throughput commitments not covered here.

Independence note: EmbeddingCost.com is built and maintained by Digital Signet as an editorial property. It does not accept paid placements, vendor sponsorship, or affiliate referral fees from any AI provider, vector database operator, or cloud platform listed on this site. Pricing references are compiled from public sources and verified monthly. Always confirm current pricing on the provider's own pricing page before committing to architecture decisions.