Bring your own Cerebras key and use Cerebras at its own API rates, with no markup. Keys and chats stay in your browser. Run Cerebras in parallel with other models, then compare and merge the answers.
Gemma 4 31B (Preview)
NEWGoogle Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inference (~1,850 tok/s). Vision (base64 PNG/JPEG, max 5 images / 10MB), function calling, reasoning (off by default, enable via effort). 131K context (65K free tier), 40K max output.
131K
$0.99
$1.49
Jun 2026
Z.ai GLM 4.7 (Preview)
Z.ai GLM 4.7 (355B) on Cerebras (~1,000 tok/s). Strong agentic coding, advanced reasoning (on by default), superior tool use. 131K context, 40K max output.
131K
$2.25
$2.75
Jan 2026
GPT OSS 120B
OpenAI flagship open-weight MoE (120B total, 5.1B active) on Cerebras (~3,000 tok/s). Reasoning (default medium effort) and function calling. 131K context, 40K max output.
131K
$0.35
$0.75
Aug 2025
Gemma 4 31B (Preview)
NEWGoogle Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inference (~1,850 tok/s). Vision (base64 PNG/JPEG, max 5 images / 10MB), function calling, reasoning (off by default, enable via effort). 131K context (65K free tier), 40K max output.
Z.ai GLM 4.7 (Preview)
Z.ai GLM 4.7 (355B) on Cerebras (~1,000 tok/s). Strong agentic coding, advanced reasoning (on by default), superior tool use. 131K context, 40K max output.
GPT OSS 120B
OpenAI flagship open-weight MoE (120B total, 5.1B active) on Cerebras (~3,000 tok/s). Reasoning (default medium effort) and function calling. 131K context, 40K max output.
Add your Cerebras API key and run open models on Cerebras wafer-scale hardware at their own API rates. Big-AGI adds no markup and keeps your keys and chats in your browser, not on its servers.
Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Cerebras is called.
BIG-AGI
Resources
© 2026 Token Fabrics·Built with passion in San Diego