Bring your own Z.ai key and use Z.ai at its own API rates, with no markup. Keys and chats stay in your browser. Run Z.ai in parallel with other models, then compare and merge the answers.
GLM-5.2 (1M)
NEWZ.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoning_effort control (high/max). 1M context, 128K output.
1M
$1.4
$4.4
Jun 2026
GLM-5.1
Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with stronger coding and long-horizon task autonomy. 200K context, thinking mode.
205K
$1.4
$4.4
Apr 2026
GLM-5V Turbo
First multimodal GLM-5 model. Vision-based coding agent with image/video/file inputs. 200K context, 128K output, thinking mode.
205K
$1.2
$4
Apr 2026
GLM-5 Turbo
Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and long-chain execution. 200K context, thinking mode.
205K
$1.2
$4
Mar 2026
GLM-5
Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.
205K
$1
$3.2
Feb 2026
GLM-OCR (Vision, OCR)
Specialized OCR model for text extraction from images and documents.
131K
$0.03
$0.03
Feb 2026
GLM-4.7 Flash (Free)
Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.
131K
-
-
Jan 2026
GLM-4.7 FlashX
Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.
131K
$0.07
$0.4
Jan 2026
GLM-4.7
Latest-gen GLM model with 128K context. Thinking mode activated by default.
131K
$0.6
$2.2
Dec 2025
AutoGLM Phone
Mobile phone automation agent. Understands phone screens via multimodal perception and executes automated operations.
131K
-
-
Dec 2025
GLM-4.6 V
Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.
131K
$0.3
$0.9
Dec 2025
GLM-4.6 V Flash (Free)
Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.
131K
-
-
Dec 2025
GLM-4.6 V FlashX
Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.
131K
$0.04
$0.4
Dec 2025
GLM-4.6
GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.
131K
$0.6
$2.2
Sep 2025
GLM-4.5 V
Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.
98K
$0.6
$1.8
Aug 2025
GLM-4.5 Air
Lightweight GLM-4.5 variant. Interleaved thinking.
98K
$0.2
$1.1
Jul 2025
GLM-4.5
Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.
98K
$0.6
$2.2
Jul 2025
GLM-4.5 X
Extended GLM-4.5 model. Interleaved thinking.
98K
$2.2
$8.9
Jul 2025
GLM-4.5 AirX
Extended lightweight GLM-4.5 variant. Interleaved thinking.
98K
$1.1
$4.5
Jul 2025
GLM-4.5 Flash (Free)
Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.
98K
-
-
Jul 2025
GLM-4 32B (0414) 128K
GLM-4 32B model with 128K context, 16K output.
131K
$0.1
$0.1
Apr 2025
GLM-5.2 (1M)
NEWZ.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoning_effort control (high/max). 1M context, 128K output.
GLM-5.1
Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with stronger coding and long-horizon task autonomy. 200K context, thinking mode.
GLM-5V Turbo
First multimodal GLM-5 model. Vision-based coding agent with image/video/file inputs. 200K context, 128K output, thinking mode.
GLM-5 Turbo
Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and long-chain execution. 200K context, thinking mode.
GLM-5
Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.
GLM-OCR (Vision, OCR)
Specialized OCR model for text extraction from images and documents.
GLM-4.7 Flash (Free)
Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.
GLM-4.7 FlashX
Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.
GLM-4.7
Latest-gen GLM model with 128K context. Thinking mode activated by default.
AutoGLM Phone
Mobile phone automation agent. Understands phone screens via multimodal perception and executes automated operations.
GLM-4.6 V
Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.
GLM-4.6 V Flash (Free)
Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.
GLM-4.6 V FlashX
Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.
GLM-4.6
GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.
GLM-4.5 V
Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.
GLM-4.5 Air
Lightweight GLM-4.5 variant. Interleaved thinking.
GLM-4.5
Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.
GLM-4.5 X
Extended GLM-4.5 model. Interleaved thinking.
GLM-4.5 AirX
Extended lightweight GLM-4.5 variant. Interleaved thinking.
GLM-4.5 Flash (Free)
Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.
GLM-4 32B (0414) 128K
GLM-4 32B model with 128K context, 16K output.
Add your Z.ai API key and run the GLM models at Z.ai's own API rates. Big-AGI adds no markup and keeps your keys and chats in your browser, not on its servers.
Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Z.ai is called.
BIG-AGI
Resources
© 2026 Token Fabrics·Built with passion in San Diego