Fastest LLM APIs for Low-Latency Apps
LLM APIs optimized for fast inference and low latency. Compare pricing for models available on fast inference providers like Groq, Together, and Fireworks.
Cost calculator for this use case
🥇 Granite 4.0 Micro
$—
🥈 Llama 3.1 8B
$—
🥉 LFM2 24B A2B
$—
Full ranking — top 15 models
| # | Model | Provider | Input $/Mtok | Output $/Mtok | Blended | Context | |
|---|---|---|---|---|---|---|---|
| 1 | Granite 4.0 Micro | IBM | $0.017 | $0.112 | $0.065 | 128K | → |
| 2 | Llama 3.1 8B | Meta | $0.050 | $0.080 | $0.065 | 128K | → |
| 3 | LFM2 24B A2B | Together | $0.030 | $0.120 | $0.075 | 128K | → |
| 4 | Nova Micro | Amazon | $0.035 | $0.140 | $0.088 | 128K | → |
| 5 | Ministral 3 3B | Mistral | $0.100 | $0.100 | $0.100 | 128K | → |
| 6 | Reka Edge | Reka | $0.100 | $0.100 | $0.100 | 66K | → |
| 7 | Qwen-Turbo | Alibaba | $0.050 | $0.200 | $0.125 | 1M | → |
| 8 | Mistral Small 3.2 24B | Mistral | $0.080 | $0.200 | $0.140 | 128K | → |
| 9 | Gemini 2.5 Flash | $0.075 | $0.300 | $0.188 | 1M | → | |
| 10 | DeepSeek V4 Flash | DeepSeek | $0.140 | $0.280 | $0.210 | 1M | → |
| 11 | DeepSeek V4 Flash | Fireworks | $0.140 | $0.280 | $0.210 | 1M | → |
| 12 | GLM-4.7-FlashX | Zhipu | $0.070 | $0.400 | $0.235 | 128K | → |
| 13 | Gemini 2.5 Flash-Lite | $0.100 | $0.400 | $0.250 | 1M | → | |
| 14 | Qwen-Flash | Alibaba | $0.115 | $0.460 | $0.288 | 1M | → |
| 15 | Grok 4.1 Fast | xAI | $0.200 | $0.500 | $0.350 | 2M | → |
How models are selected
Models tagged as fast/speed-optimized, sorted by blended cost.
Prices are per million tokens (Mtok), sourced directly from official provider pricing pages and verified by our automated scraper pipeline that runs 3x daily. "Blended cost" is the average of input and output pricing — a quick proxy for typical 50/50 usage patterns.