Google Gemini Embedding Pricing: gemini-embedding-001, 2-preview & text-embedding-005 (April 2026)
Google's embedding pricing is spread across Gemini API and Vertex AI docs. This page consolidates the rates, explains Matryoshka dimension options, and clarifies the Gemini API vs Vertex AI billing differences.
Current Pricing
| Model | $/M tokens | Dims (MRL) | Context | Status |
|---|---|---|---|---|
| gemini-embedding-2-preview | $0.20 | 768/1536/3072 | 8,192 tokens | Preview |
| gemini-embedding-001 | $0.15 | 768/1536/3072 | 2,048 tokens | GA (stable) |
| text-embedding-005 | $0.15 | 768/1536/3072 | 2,048 tokens | Legacy |
All models accessed via Gemini API or Vertex AI. Preview pricing subject to change at GA launch.
Matryoshka Dimensions: Storage Cost Impact
Both gemini-embedding-001 and gemini-embedding-2-preview support Matryoshka Representation Learning. You can request 768, 1536, or 3072-dimension vectors. The API token price is identical regardless of dimension count - savings are in downstream storage. For 100 million vectors:
| Dimensions | Bytes/vector | GB per 100M vecs | Storage ratio |
|---|---|---|---|
| 3,072 | 12,288 | 11.4 GB | 4x (baseline) |
| 1,536 | 6,144 | 5.7 GB | 2x |
| 768 | 3,072 | 2.9 GB | 1x (cheapest) |
Using 768 dimensions instead of 3072 reduces storage cost by 4x. Quality loss on MTEB Retrieval is typically 2-5 points. Good trade-off for high-scale applications where storage costs dominate.
Vertex AI vs Gemini API: Which to Use
- - Free tier: ~1,500 requests/day
- - Simple token-based billing
- - Quick setup, great for prototypes
- - Rate limits lower than Vertex
- - Data processed in Google's infra
- - $300 free trial credits (90 days)
- - Enterprise quotas and SLAs
- - VPC controls, data residency
- - Google Cloud billing integration
- - Best for production GCP workloads
gemini-embedding-2: The Multimodal Upgrade
The gemini-embedding-2-preview model is natively multimodal - it can embed text, images, and video in a shared vector space. At $0.20/M tokens (text), it is Google's premium offering. During preview, pricing may change at general availability. The 8,192-token context window (vs 2,048 for gemini-embedding-001) is a significant upgrade for long-document RAG applications.