Use Groq Models in Big-AGI.

Bring your own Groq key and use Groq at its own API rates, with no markup. Keys and chats stay in your browser. Run Groq in parallel with other models, then compare and merge the answers.

[Alibaba Cloud] Qwen 3.6 · 27B (Preview)
[OpenAI] GPT OSS Safeguard 20B (Preview)
[Groq] Compound (Agentic System)

All supported Groq models

ModelContextInputOutputReleased

[Alibaba Cloud] Qwen 3.6 · 27B (Preview)

Tools / functions

Qwen3.6 27B by Alibaba Cloud. Multimodal (vision + text), flagship-level agentic coding, thinking/non-thinking modes, tool use. 131K context, 32K max output. ~…

131K

$0.6

$3

May 2026

[OpenAI] GPT OSS Safeguard 20B (Preview)

Tools / functions

OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Gro…

131K

$0.08

$0.3

Oct 2025

[Groq] Compound (Agentic System)

Tools / functions

Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.

131K

-

-

Sep 2025

[Groq] Compound Mini (Agentic System)

Tools / functions

Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.

131K

-

-

Sep 2025

[OpenAI] GPT OSS 20B

Tools / functions

OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.

131K

$0.08

$0.3

Aug 2025

[OpenAI] GPT OSS 120B

Tools / functions

OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.

131K

$0.15

$0.6

Aug 2025

[Alibaba Cloud] Qwen 3 · 32B (Preview)

Tools / functions

Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.

131K

$0.29

$0.59

Apr 2025

[Meta] Llama 4 Scout · 17B × 16E (Preview)

Tools / functions

Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.

131K

$0.11

$0.34

Apr 2025

[Meta] Llama 3.3 · 70B Versatile

Tools / functions

Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.

131K

$0.59

$0.79

Dec 2024

[Meta] Llama 3.1 · 8B Instant

Tools / functions

Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.

131K

$0.05

$0.08

Jul 2024

[Alibaba Cloud] Qwen 3.6 · 27B (Preview)

May 2026

Qwen3.6 27B by Alibaba Cloud. Multimodal (vision + text), flagship-level agentic coding, thinking/non-thinking modes, tool use. 131K context, 32K max output. ~…

Tools / functions
131K · in $0.6 · out $3

[OpenAI] GPT OSS Safeguard 20B (Preview)

Oct 2025

OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Gro…

Tools / functions
131K · in $0.08 · out $0.3

[Groq] Compound (Agentic System)

Sep 2025

Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.

Tools / functions
131K · in - · out -

[Groq] Compound Mini (Agentic System)

Sep 2025

Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.

Tools / functions
131K · in - · out -

[OpenAI] GPT OSS 20B

Aug 2025

OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.

Tools / functions
131K · in $0.08 · out $0.3

[OpenAI] GPT OSS 120B

Aug 2025

OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.

Tools / functions
131K · in $0.15 · out $0.6

[Alibaba Cloud] Qwen 3 · 32B (Preview)

Apr 2025

Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.

Tools / functions
131K · in $0.29 · out $0.59

[Meta] Llama 4 Scout · 17B × 16E (Preview)

Apr 2025

Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.

Tools / functions
131K · in $0.11 · out $0.34

[Meta] Llama 3.3 · 70B Versatile

Dec 2024

Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.

Tools / functions
131K · in $0.59 · out $0.79

[Meta] Llama 3.1 · 8B Instant

Jul 2024

Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.

Tools / functions
131K · in $0.05 · out $0.08
10 models · sorted by release date · prices in USD per 1M tokens · refreshed every 30 minutesCompare every model across vendors →

Get started in 3 steps

1

Create an API key at the Groq console.

2

Paste it into Big-AGI's model settings.

3

Start chatting, or Beam it against other models and fuse the answers.

Running Groq in Big-AGI

Add your Groq API key and reach its catalog at Groq's own rates. Big-AGI adds no markup and no intermediary: the billing relationship runs directly between you and Groq.

  • Your key, your billing. Usage is billed by Groq directly to your account.
  • Built for speed. Groq's LPU hardware pushes open models past 500 tokens per second, some past 1,000, so long generations and agentic loops finish before you'd expect them to start.
  • The real creator, labeled. Groq hosts Meta's Llama, OpenAI's GPT-OSS, Alibaba's Qwen, and Moonshot's Kimi; Big-AGI prefixes every model with who actually built it, not just who's serving it.
  • Compound, priced through. Groq's own Compound and Compound Mini agentic systems chain web search, code execution, and browser automation on top of those models, billed at the underlying model's rate.

Why not just the Groq playground?

The playground proves the speed; it doesn't give you a workspace. Big-AGI adds persistent chats, personas, and attachments on top of the raw API, then puts that speed to work in Beam: run a Groq-hosted model next to Claude, GPT, and Gemini, and it's usually the first one back, sometimes by a wide margin.

Parameters, the key, and the chats stay yours to control instead of locked into a single-prompt tool.

Your keys and your data

Turn on Direct Connection and the browser calls Groq directly, bypassing the Big-AGI server, when your key is client-side and Groq allows it. Your keys stay in your browser. Chats are stored locally first, and sync only if you turn it on. The AI Inspector shows the exact request, the token counts, and a cost estimate.

Groq in Beam

Put a Groq-hosted model into a Beam next to Claude, GPT, and Gemini, and let its speed set the pace. Fusions then combine, cross-check, and synthesize the parallel answers instead of just picking the best one. Parallel runs use more tokens than a single chat.

Bring your Groq key. Keep control.

Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Groq is called.

© 2026 Token Fabrics·Built with passion in San Diego