BEAM

Features

Rankings

Pro

Docs

GitHub

LAUNCH APP

Use Groq Models in Big-AGI.

Bring your own key: Groq's API rates, no markup. Keys and chats stay in your browser. Run Groq in parallel with other models, then compare and merge the answers.

[Alibaba Cloud] Qwen 3.6 · 27B (Preview)

[OpenAI] GPT OSS Safeguard 20B (Preview)

[Groq] Compound (Agentic System)

Launch Big-AGI

All supported Groq models

ModelContextInputOutputReleased

[Alibaba Cloud] Qwen 3.6 · 27B (Preview)

Qwen3.6 27B by Alibaba Cloud. Multimodal (vision + text), flagship-level agentic coding, thinking/non-thinking modes, tool use. 131K context, 32K max output. ~…

131K

$0.6

Apr 2026

[OpenAI] GPT OSS Safeguard 20B (Preview)

OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Gro…

131K

$0.08

$0.3

Oct 2025

[Groq] Compound (Agentic System)

Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.

131K

Sep 2025

[Groq] Compound Mini (Agentic System)

Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.

131K

Sep 2025

[OpenAI] GPT OSS 20B

OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.

131K

$0.08

$0.3

Aug 2025

[OpenAI] GPT OSS 120B

OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.

131K

$0.15

$0.6

Aug 2025

[Alibaba Cloud] Qwen 3 · 32B (Preview)

deprecated

Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.

131K

$0.29

$0.59

Apr 2025

[Meta] Llama 4 Scout · 17B × 16E (Preview)

deprecated

Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.

131K

$0.11

$0.34

Apr 2025

[Meta] Llama 3.3 · 70B Versatile

Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.

131K

$0.59

$0.79

Dec 2024

[Meta] Llama 3.1 · 8B Instant

Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.

131K

$0.05

$0.08

Jul 2024

[Alibaba Cloud] Qwen 3.6 · 27B (Preview)

Apr 2026

Qwen3.6 27B by Alibaba Cloud. Multimodal (vision + text), flagship-level agentic coding, thinking/non-thinking modes, tool use. 131K context, 32K max output. ~…

131K · in $0.6 · out $3

[OpenAI] GPT OSS Safeguard 20B (Preview)

Oct 2025

OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Gro…

131K · in $0.08 · out $0.3

[Groq] Compound (Agentic System)

Sep 2025

Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.

131K · in - · out -

[Groq] Compound Mini (Agentic System)

Sep 2025

Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.

131K · in - · out -

[OpenAI] GPT OSS 20B

Aug 2025

OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.

131K · in $0.08 · out $0.3

[OpenAI] GPT OSS 120B

Aug 2025

OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.

131K · in $0.15 · out $0.6

[Alibaba Cloud] Qwen 3 · 32B (Preview)

deprecated

Apr 2025

Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.

131K · in $0.29 · out $0.59

[Meta] Llama 4 Scout · 17B × 16E (Preview)

deprecated

Apr 2025

Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.

131K · in $0.11 · out $0.34

[Meta] Llama 3.3 · 70B Versatile

Dec 2024

Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.

131K · in $0.59 · out $0.79

[Meta] Llama 3.1 · 8B Instant

Jul 2024

Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.

131K · in $0.05 · out $0.08

10 models · sorted by release date · prices in USD per 1M tokens · refreshed every 30 minutesCompare every model across vendors →

Get started in 3 steps

Create an API key at the Groq console.

Paste it into Big-AGI's model settings.

Start chatting, or Beam it against other models and fuse the answers.

Running Groq in Big-AGI

Add your Groq API key and reach its catalog at Groq's own rates. Big-AGI adds no markup and no intermediary: the billing relationship runs directly between you and Groq.

Your key, your billing. Usage is billed by Groq directly to your account.
Built for speed. Groq's LPU hardware pushes open models past 500 tokens per second, some past 1,000, so long generations and agentic loops finish before you'd expect them to start.
The real creator, labeled. Groq hosts Meta's Llama, OpenAI's GPT-OSS, Alibaba's Qwen, and Moonshot's Kimi; Big-AGI prefixes every model with who actually built it, not just who's serving it.
Compound, priced through. Groq's own Compound and Compound Mini agentic systems chain web search, code execution, and browser automation on top of those models, billed at the underlying model's rate.

Why not just the Groq playground?

The playground proves the speed; it doesn't give you a workspace. Big-AGI adds persistent chats, personas, and attachments on top of the raw API, then puts that speed to work in Beam: run a Groq-hosted model next to Claude, GPT, and Gemini, and it's usually the first one back, sometimes by a wide margin.

Parameters, the key, and the chats stay yours to control instead of locked into a single-prompt tool.

Your keys and your data

Turn on Direct Connection and the browser calls Groq directly, bypassing the Big-AGI server, when your key is client-side and Groq allows it. Your keys stay in your browser. Chats are stored locally first, and sync only if you turn it on. The AI Inspector shows the exact request, the token counts, and a cost estimate.

Groq in Beam

Put a Groq-hosted model into a Beam next to Claude, GPT, and Gemini, and let its speed set the pace. Fusions then combine, cross-check, and synthesize the parallel answers instead of just picking the best one. Parallel runs use more tokens than a single chat.

Bring your Groq key. Keep control.

Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Groq is called.

Launch Big-AGI

<- All Models

Alibaba

Anthropic

AWS Bedrock

Azure

Cerebras

DeepSeek

Fireworks AI

Google Gemini

Groq

MiniMax

Mistral

Moonshot

NVIDIA

OpenAI

OpenRouter

Perplexity

Sakana AI

SpaceXAI

Together AI

Z.ai

BIG-AGI

Product

Features Models Controls Changelog BEAM Technology

Resources

Documentation Discord GitHub

Company

Email Us Privacy Terms