Groq

LPU (Language Processing Unit) inference provider delivering the fastest inference speeds in the industry — up to 1000+ tokens/second. Linear pricing with no hidden costs.

Visit Groq →

Plans & Pricing

Free Tier

✓All supported open models
✓Rate limits apply
✓Prompt caching available
✓Batch API: 50% discount

API Pay-Per-Use

From $0.05/1M tokens

✓Llama 3.1 8B Instant: $0.05/1M in / $0.08/1M out (840 TPS)
✓Llama 4 Scout: $0.11/1M in / $0.34/1M out (594 TPS)
✓GPT OSS 20B: $0.075/1M in / $0.30/1M out (1000 TPS)
✓Qwen3 32B: $0.29/1M in / $0.59/1M out (662 TPS)
✓Up to 1000+ TPS

Free Tier

Free tier available with rate limits. No idle infrastructure costs.

Models (7)

Compare →

Model	Context	In /1M	Out /1M	Capabilities
Llama 4 Scout (Groq) 594 tokens/sec. Fast and affordable. ChatFunctions	128K	$0.11	$0.34	ChatFunctionsStreaming
Qwen3 32B (Groq) 662 tokens/sec. ChatFunctions	131K	$0.29	$0.59	ChatFunctionsStreaming
Llama 3.3 70B Versatile (Groq) 394 tokens/sec. ChatFunctions	128K	$0.59	$0.79	ChatFunctionsStreaming
Llama 3.1 8B Instant (Groq) 840 tokens/sec. Fastest Groq model. ChatFunctions	128K	$0.05	$0.08	ChatFunctionsStreaming
GPT OSS 20B (Groq) 1000 tokens/sec. Fastest inference available. ChatFunctions	128K	$0.07	$0.30	ChatFunctionsStreaming
GPT OSS 120B (Groq) 500 tokens/sec. ChatFunctions	128K	$0.15	$0.60	ChatFunctionsStreaming
Kimi K2 (Groq) Moonshot AI model. Vision support. ChatFunctions	128K	$1.00	$3.00	ChatFunctionsVision+1