Fireworks launches prompt caching at 50% discount across all serverless models

New cached token pricing tiers: $0.05/1M for sub-4B models, $0.10/1M for 4B-16B, $0.45/1M for 16B+. DeepSeek V3 at $0.28/1M, R1 at $0.68/1M. Kimi K2 at $0.30/1M, Qwen3 Coder 480B at $0.23/1M cached rates.

12/15/2025 → 12/22/2025

Verified Changes

1
priceAdded prompt caching pricing across all serverless inference models at 50% discount from standard input pricing [<4B params: $0.05/1M cached tokens; 4B-16B: $0.10/1M; >16B: $0.45/1M; MoE 0B-56B: $0.25/1M; MoE 56.1B-176B: $0.60/1M; DeepSeek V3: $0.28/1M; DeepSeek R1: $0.68/1M; GLM-4.5/4.6: $0.28/1M; Kimi K2: $0.30/1M; Qwen3 Coder 480B: $0.23/1M; OpenAI gpt-oss-120b: $0.07/1M; OpenAI gpt-oss-20b: $0.04/1M]

Want to see the full comparison?

Sign up for free to access high-resolution before/after images, zoom features, and track pricing changes from thousands of SaaS companies.