Bring your own Groq key and use Groq at its own API rates, with no markup. Keys and chats stay in your browser. Run Groq in parallel with other models, then compare and merge the answers.
[Alibaba Cloud] Qwen 3.6 · 27B (Preview)
Qwen3.6 27B by Alibaba Cloud. Multimodal (vision + text), flagship-level agentic coding, thinking/non-thinking modes, tool use. 131K context, 32K max output. ~…
131K
$0.6
$3
May 2026
[OpenAI] GPT OSS Safeguard 20B (Preview)
OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Gro…
131K
$0.08
$0.3
Oct 2025
[Groq] Compound (Agentic System)
Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.
131K
-
-
Sep 2025
[Groq] Compound Mini (Agentic System)
Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.
131K
-
-
Sep 2025
[OpenAI] GPT OSS 20B
OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.
131K
$0.08
$0.3
Aug 2025
[OpenAI] GPT OSS 120B
OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.
131K
$0.15
$0.6
Aug 2025
[Alibaba Cloud] Qwen 3 · 32B (Preview)
Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.
131K
$0.29
$0.59
Apr 2025
[Meta] Llama 4 Scout · 17B × 16E (Preview)
Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.
131K
$0.11
$0.34
Apr 2025
[Meta] Llama 3.3 · 70B Versatile
Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.
131K
$0.59
$0.79
Dec 2024
[Meta] Llama 3.1 · 8B Instant
Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.
131K
$0.05
$0.08
Jul 2024
[Alibaba Cloud] Qwen 3.6 · 27B (Preview)
Qwen3.6 27B by Alibaba Cloud. Multimodal (vision + text), flagship-level agentic coding, thinking/non-thinking modes, tool use. 131K context, 32K max output. ~…
[OpenAI] GPT OSS Safeguard 20B (Preview)
OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Gro…
[Groq] Compound (Agentic System)
Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.
[Groq] Compound Mini (Agentic System)
Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.
[OpenAI] GPT OSS 20B
OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.
[OpenAI] GPT OSS 120B
OpenAI flagship open-weight MoE (120B total, 5.1B active). Reasoning, browser search, code execution. 131K context, 65K max output. ~500 t/s on Groq.
[Alibaba Cloud] Qwen 3 · 32B (Preview)
Qwen3 32B by Alibaba Cloud. Supports thinking/non-thinking modes, 100+ languages. 131K context, 40K max output. ~400 t/s on Groq.
[Meta] Llama 4 Scout · 17B × 16E (Preview)
Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.
[Meta] Llama 3.3 · 70B Versatile
Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.
[Meta] Llama 3.1 · 8B Instant
Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.
1
Create an API key at the Groq console.
2
Paste it into Big-AGI's model settings.
3
Start chatting, or Beam it against other models and fuse the answers.
Add your Groq API key and reach its catalog at Groq's own rates. Big-AGI adds no markup and no intermediary: the billing relationship runs directly between you and Groq.
The playground proves the speed; it doesn't give you a workspace. Big-AGI adds persistent chats, personas, and attachments on top of the raw API, then puts that speed to work in Beam: run a Groq-hosted model next to Claude, GPT, and Gemini, and it's usually the first one back, sometimes by a wide margin.
Parameters, the key, and the chats stay yours to control instead of locked into a single-prompt tool.
Turn on Direct Connection and the browser calls Groq directly, bypassing the Big-AGI server, when your key is client-side and Groq allows it. Your keys stay in your browser. Chats are stored locally first, and sync only if you turn it on. The AI Inspector shows the exact request, the token counts, and a cost estimate.
Put a Groq-hosted model into a Beam next to Claude, GPT, and Gemini, and let its speed set the pace. Fusions then combine, cross-check, and synthesize the parallel answers instead of just picking the best one. Parallel runs use more tokens than a single chat.
Your key, your data, your choice of model. Big-AGI is open source and self-hostable, so you can check exactly how Groq is called.
BIG-AGI
Resources
© 2026 Token Fabrics·Built with passion in San Diego