BEAM

Features

Rankings

Pro

Docs

GitHub

LAUNCH APP

Use Z.ai Models in Big-AGI.

Bring your own key: Z.ai's API rates, no markup. Keys and chats stay in your browser. Run Z.ai in parallel with other models, then compare and merge the answers.

GLM-5.2 (1M)

GLM-5.1

GLM-5V Turbo

Launch Big-AGI

All supported Z.ai models

ModelContextInputOutputReleased

GLM-5.2 (1M)

NEW

Z.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoning_effort control (high/max). 1M context, 128K output.

$1.4

$4.4

Jun 2026

GLM-5.1

Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with stronger coding and long-horizon task autonomy. 200K context, thinking mode.

205K

$1.4

$4.4

Apr 2026

GLM-5V Turbo

First multimodal GLM-5 model. Vision-based coding agent with image/video/file inputs. 200K context, 128K output, thinking mode.

205K

$1.2

Apr 2026

GLM-5 Turbo

Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and long-chain execution. 200K context, thinking mode.

205K

$1.2

Mar 2026

GLM-5

Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.

205K

$3.2

Feb 2026

GLM-OCR (Vision, OCR)

Specialized OCR model for text extraction from images and documents.

131K

$0.03

Feb 2026

GLM-4.7 Flash (Free)

Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.

131K

Jan 2026

GLM-4.7 FlashX

Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.

131K

$0.07

$0.4

Jan 2026

GLM-4.7

Latest-gen GLM model with 128K context. Thinking mode activated by default.

131K

$0.6

$2.2

Dec 2025

AutoGLM Phone

Mobile phone automation agent. Understands phone screens via multimodal perception and executes automated operations.

131K

Dec 2025

GLM-4.6 V

Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.

131K

$0.3

$0.9

Dec 2025

GLM-4.6 V Flash (Free)

Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.

131K

Dec 2025

GLM-4.6 V FlashX

Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.

131K

$0.04

$0.4

Dec 2025

GLM-4.6

GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.

131K

$0.6

$2.2

Sep 2025

GLM-4.5 V

Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.

98K

$0.6

$1.8

Aug 2025

GLM-4.5 X

Extended GLM-4.5 model. Interleaved thinking.

98K

$2.2

$8.9

Jul 2025

GLM-4.5 AirX

Extended lightweight GLM-4.5 variant. Interleaved thinking.

98K

$1.1

$4.5

Jul 2025

GLM-4.5 Flash (Free)

Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.

98K

Jul 2025

GLM-4.5

Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.

98K

$0.6

$2.2

Jul 2025

GLM-4.5 Air

Lightweight GLM-4.5 variant. Interleaved thinking.

98K

$0.2

$1.1

Jul 2025

GLM-4 32B (0414) 128K

GLM-4 32B model with 128K context, 16K output.

131K

$0.1

Apr 2025

GLM-5.2 (1M)

NEW

Jun 2026

Z.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoning_effort control (high/max). 1M context, 128K output.

1M · in $1.4 · out $4.4

GLM-5.1

Apr 2026

Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with stronger coding and long-horizon task autonomy. 200K context, thinking mode.

205K · in $1.4 · out $4.4

GLM-5V Turbo

Apr 2026

First multimodal GLM-5 model. Vision-based coding agent with image/video/file inputs. 200K context, 128K output, thinking mode.

205K · in $1.2 · out $4

GLM-5 Turbo

Mar 2026

Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and long-chain execution. 200K context, thinking mode.

205K · in $1.2 · out $4

GLM-5

Feb 2026

Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.

205K · in $1 · out $3.2

GLM-OCR (Vision, OCR)

Feb 2026

Specialized OCR model for text extraction from images and documents.

131K · in $0.03 · out $0.03

GLM-4.7 Flash (Free)

Jan 2026

Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.

131K · in - · out -

GLM-4.7 FlashX

Jan 2026

Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.

131K · in $0.07 · out $0.4

GLM-4.7

Dec 2025

Latest-gen GLM model with 128K context. Thinking mode activated by default.

131K · in $0.6 · out $2.2

AutoGLM Phone

Dec 2025

Mobile phone automation agent. Understands phone screens via multimodal perception and executes automated operations.

131K · in - · out -

GLM-4.6 V

Dec 2025

Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.

131K · in $0.3 · out $0.9

GLM-4.6 V Flash (Free)

Dec 2025

Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.

131K · in - · out -

GLM-4.6 V FlashX

Dec 2025

Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.

131K · in $0.04 · out $0.4

GLM-4.6

Sep 2025

GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.

131K · in $0.6 · out $2.2

GLM-4.5 V

Aug 2025

Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.

98K · in $0.6 · out $1.8

GLM-4.5 X

Jul 2025

Extended GLM-4.5 model. Interleaved thinking.

98K · in $2.2 · out $8.9

GLM-4.5 AirX

Jul 2025

Extended lightweight GLM-4.5 variant. Interleaved thinking.

98K · in $1.1 · out $4.5

GLM-4.5 Flash (Free)

Jul 2025

Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.

98K · in - · out -

GLM-4.5

Jul 2025

Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.

98K · in $0.6 · out $2.2

GLM-4.5 Air

Jul 2025

Lightweight GLM-4.5 variant. Interleaved thinking.

98K · in $0.2 · out $1.1

GLM-4 32B (0414) 128K

Apr 2025

GLM-4 32B model with 128K context, 16K output.

131K · in $0.1 · out $0.1

21 models · sorted by release date · prices in USD per 1M tokens · refreshed every 30 minutesCompare every model across vendors →

Get started in 3 steps

Create an API key at the Z.ai console.

Paste it into Big-AGI's model settings.

Start chatting, or Beam it against other models and fuse the answers.

Running Z.ai in Big-AGI

Add your Z.ai API key and run the GLM models at Z.ai's own API rates. Big-AGI adds no markup and no intermediary: billing runs directly between you and Z.ai, and your keys stay in your browser.

Your key, your billing. Usage is billed by Z.ai to your account. Big-AGI does not meter or charge for model usage.
Strong at code and agents. GLM is tuned for coding and tool use, a capable and cost-effective pick for daily work.
A catalog that doesn't break. Z.ai's own model-list endpoint is unreliable, so Big-AGI runs from a hand-curated list of GLM models and layers a live check on top just to catch brand-new releases. A broken listing never takes the catalog down, and a model Z.ai advertises but doesn't actually serve gets hidden so you never pick something that would just fail.

Why run GLM here?

Z.ai's own chat product is a reasonable way to try GLM, but it only ever shows you GLM's take on a prompt. Beam sends the same prompt to GLM and to GPT, Claude, or Gemini at once, so a code review or a debugging session gets more than one model's opinion before you act on it. You also get parameters a consumer chat interface does not surface (temperature, system prompt, per-turn model swaps), and a key that stays in your browser instead of sitting on Z.ai's servers.

Your keys and your data

Turn on Direct Connection and the browser calls Z.ai directly, bypassing the Big-AGI server, whenever your key is client-side and Z.ai allows it. Your keys stay in your browser. Chats are stored locally first and sync only if you turn it on. The AI Inspector opens on any message to show the exact request sent to Z.ai, the token counts, and a cost estimate for that call.

GLM in Beam

Run GLM in parallel with Claude, GPT, and Gemini on the same prompt, then reach for Fusions: several strategies that combine, cross-check, and synthesize the parallel answers, which beats just picking the single best one. Parallel runs use more tokens than a single chat.

Bring your Z.ai key. Keep control.

Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Z.ai is called.

Launch Big-AGI

<- All Models

Alibaba

Anthropic

AWS Bedrock

Azure

Cerebras

DeepSeek

Fireworks AI

Google Gemini

Groq

MiniMax

Mistral

Moonshot

OpenAI

OpenRouter

Perplexity

Sakana AI

SpaceXAI

Together AI

Z.ai

BIG-AGI

Product

Features Models Controls Changelog BEAM Technology

Resources

Documentation Discord GitHub

Company

Email Us Privacy Terms