Docs·API

Cerebras

InferenceFREE TIER

Wafer-scale chip inference — 1,000+ tokens/sec

cerebras.ai ↗Pricing →/api/v1/latency

Operational

All systems responding normally

Last checked 21/05/2026, 9:54:34 pm

123ms response

Uptime History100.00% uptime

2026-05-16Today

Uptime

100.00%

Avg Latency

262ms

P95 Latency

351ms

Fastest

73ms

Checks

150

Response Time

Last 60 checks

73ms min262ms avg4077ms max

💰 Pricing

llama-3.3-70bFREE

Input: $0.6/1MOutput: $0.6/1M

1000+ tokens/sec. llama-3.1-8b: $0.10/$0.10

Free tier available

⚡ Rate Limits

free

RPM: 30TPM: 60,000

🤖 Models (1)

Model	Task	Context	Vision	Tools	JSON
Llama 3.3 70B 1000+ tokens/sec sustained	llm	128k	—	✅	✅

Recent Checks

Showing last 15

Operational

123ms21 May, 09:54 pm

Operational

181ms21 May, 08:45 pm

Operational

204ms21 May, 07:23 pm

Operational

291ms21 May, 06:01 pm

Operational

274ms21 May, 04:37 pm

Operational

198ms21 May, 03:17 pm

Operational

196ms21 May, 01:47 pm

Operational

240ms21 May, 12:22 pm

Operational

195ms21 May, 11:23 am

Operational

135ms21 May, 10:16 am

Operational

248ms21 May, 09:48 am

Operational

298ms21 May, 09:16 am

Operational

178ms21 May, 08:42 am

Operational

167ms21 May, 08:00 am

Operational

204ms21 May, 07:13 am

API Quick Access

Health Latency Freshness Pricing Models

Other Inference Providers

Groq

LPU inference — fastest tokens per second on the market

Together AI

Open-source model inference — Llama, Mixtral, FLUX

Fireworks AI

Fast open-model inference — FireFunction, Llama, Mixtral

OpenRouter

Unified API across 200+ models — route by price or speed

Hugging Face

Serverless inference API — 100k+ open models on demand

fal.ai

Ultra-fast image & video model inference for agents

Visit Cerebras →View Pricing ← All Providers