Use Cerebras Models in Big-AGI.

Bring your own Cerebras key and use Cerebras at its own API rates, with no markup. Keys and chats stay in your browser. Run Cerebras in parallel with other models, then compare and merge the answers.

Gemma 4 31B (Preview)
Z.ai GLM 4.7 (Preview)
GPT OSS 120B

All supported Cerebras models

ModelContextInputOutputReleased

Gemma 4 31B (Preview)

NEW
VisionReasoningTools / functions

Google Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inference (~1,850 tok/s). Vision (base64 PNG/JPEG, max 5 images / 10MB), function calling, reasoning (off by default, enable via effort). 131K context (65K free tier), 40K max output.

131K

$0.99

$1.49

Jun 2026

Z.ai GLM 4.7 (Preview)

ReasoningTools / functions

Z.ai GLM 4.7 (355B) on Cerebras (~1,000 tok/s). Strong agentic coding, advanced reasoning (on by default), superior tool use. 131K context, 40K max output.

131K

$2.25

$2.75

Jan 2026

GPT OSS 120B

ReasoningTools / functions

OpenAI flagship open-weight MoE (120B total, 5.1B active) on Cerebras (~3,000 tok/s). Reasoning (default medium effort) and function calling. 131K context, 40K max output.

131K

$0.35

$0.75

Aug 2025

Gemma 4 31B (Preview)

NEW
Jun 2026

Google Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inference (~1,850 tok/s). Vision (base64 PNG/JPEG, max 5 images / 10MB), function calling, reasoning (off by default, enable via effort). 131K context (65K free tier), 40K max output.

VisionReasoningTools / functions
131K · in $0.99 · out $1.49

Z.ai GLM 4.7 (Preview)

Jan 2026

Z.ai GLM 4.7 (355B) on Cerebras (~1,000 tok/s). Strong agentic coding, advanced reasoning (on by default), superior tool use. 131K context, 40K max output.

ReasoningTools / functions
131K · in $2.25 · out $2.75

GPT OSS 120B

Aug 2025

OpenAI flagship open-weight MoE (120B total, 5.1B active) on Cerebras (~3,000 tok/s). Reasoning (default medium effort) and function calling. 131K context, 40K max output.

ReasoningTools / functions
131K · in $0.35 · out $0.75
3 models · sorted by release date · prices in USD per 1M tokens · refreshed every 30 minutesCompare every model across vendors →

Running Cerebras in Big-AGI

Add your Cerebras API key and run open models on Cerebras wafer-scale hardware at their own API rates. Big-AGI adds no markup and keeps your keys and chats in your browser, not on its servers.

  • Your key, your billing. Usage is billed by Cerebras to your account. Big-AGI does not meter or charge for model usage.
  • Direct Connection. Turn it on and the browser calls Cerebras directly, bypassing the Big-AGI server, when your key is client-side and Cerebras allows it.
  • Built for speed. Cerebras serves open models at very high tokens per second, so long generations and agentic loops feel instant.
  • Beam. Run a Cerebras model in parallel with Claude, GPT, and Gemini, then compare or merge the answers. Parallel runs use more tokens than a single chat.

Bring your Cerebras key. Keep control.

Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Cerebras is called.

© 2026 Token Fabrics·Built with passion in San Diego