What is the best LLM API for fast inference?

Based on our verified pricing data, the cheapest model that qualifies is Granite 4.0 Micro by IBM at $0.017/Mtok input. See the full ranking above for more options.

How often are prices updated?

Prices are verified against official provider pricing pages 3 times daily (8am, 2pm, 8pm UTC) by our automated scraper pipeline.

Pricing / Best For / Fastest LLM APIs for Low-Latency Apps

Fastest LLM APIs for Low-Latency Apps

LLM APIs optimized for fast inference and low latency. Compare pricing for models available on fast inference providers like Groq, Together, and Fireworks.

29 models qualify Showing top 15 Sorted by blended cost

Granite 4.0 Micro

IBM

$0.017 in $0.112 out

$0.065/Mtok blended

128K ctx

LFM2 24B A2B

Together

$0.030 in $0.120 out

$0.075/Mtok blended

128K ctx

Cost calculator for this use case

Tokens per day

Input/output ratio: 70/30

Days per month

🥇 Granite 4.0 Micro $—

🥈 Llama 3.1 8B $—

🥉 LFM2 24B A2B $—

Full ranking — top 15 models

#	Model	Provider	Input $/Mtok	Output $/Mtok	Blended	Context
1	Granite 4.0 Micro	IBM	$0.017	$0.112	$0.065	128K	→
2	Llama 3.1 8B	Meta	$0.050	$0.080	$0.065	128K	→
3	LFM2 24B A2B	Together	$0.030	$0.120	$0.075	128K	→
4	Nova Micro	Amazon	$0.035	$0.140	$0.088	128K	→
5	Ministral 3 3B	Mistral	$0.100	$0.100	$0.100	128K	→
6	Reka Edge	Reka	$0.100	$0.100	$0.100	66K	→
7	Qwen-Turbo	Alibaba	$0.050	$0.200	$0.125	1M	→
8	Mistral Small 3.2 24B	Mistral	$0.080	$0.200	$0.140	128K	→
9	Gemini 2.5 Flash	Google	$0.075	$0.300	$0.188	1M	→
10	DeepSeek V4 Flash	DeepSeek	$0.140	$0.280	$0.210	1M	→
11	DeepSeek V4 Flash	Fireworks	$0.140	$0.280	$0.210	1M	→
12	GLM-4.7-FlashX	Zhipu	$0.070	$0.400	$0.235	128K	→
13	Gemini 2.5 Flash-Lite	Google	$0.100	$0.400	$0.250	1M	→
14	Qwen-Flash	Alibaba	$0.115	$0.460	$0.288	1M	→
15	Grok 4.1 Fast	xAI	$0.200	$0.500	$0.350	2M	→

How models are selected

Models tagged as fast/speed-optimized, sorted by blended cost.

Prices are per million tokens (Mtok), sourced directly from official provider pricing pages and verified by our automated scraper pipeline that runs 3x daily. "Blended cost" is the average of input and output pricing — a quick proxy for typical 50/50 usage patterns.