The Big-AGI Model Index
Context window, input and output pricing, and capabilities for every model across every provider. Filter, sort, compare. Bring your own key, pay provider rates, no markup.
Nano Banana 2 Lite
NEWGemini 3.1 Flash Lite Image. (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
131K
$0.25
$1.5
Jul 2026
Gemini Omni Flash Preview
NEWGemini Omni Flash Preview (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
197K
-
-
Jul 2026
Gemma 4 31B (Preview)
NEWGoogle Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inference (~1,850 tok/s). Vision (base64 PNG/JPEG, max 5 images / 10MB), function calling, reasoning (off by default, enable via effort). 131K context (65K free tier), 40K max output.
131K
$0.99
$1.49
Jun 2026
Claude Sonnet 5
NEWBest combination of speed and intelligence, with the largest gains in coding and agentic tasks
1M
$2
$10
Jun 2026
Kimi K2.7 Code (Alibaba)
NEWMoonshot Kimi K2.7 Code served via Alibaba Model Studio. Multimodal, always-on thinking, 256K context. (Alibaba pricing not yet published.)
262K
$0.95
$4
Jun 2026
DeepSeek V4 Pro (Alibaba)
NEWDeepSeek V4 Pro served via Alibaba Model Studio (Alibaba pricing, ~5x DeepSeek-direct). 1M context, thinking.
1M
$2.4
$4.8
Jun 2026
DeepSeek V3.2 (Alibaba)
NEWDeepSeek V3.2 served via Alibaba Model Studio (superseded by V4). Thinking.
131K
$0.57
$1.71
Jun 2026
Sakana Fugu Ultra
NEWMulti-agent conductor system routing 1-3 expert agents for complex, multi-step reasoning - maximum answer quality on hard tasks. 1M context.
1M
$5
$30
Jun 2026
Sakana Fugu
NEWFast orchestration model routing tasks across a swappable pool of frontier LLMs - low latency, high quality. 1M context. Billed at the routed underlying model's standard rate.
1M
-
-
Jun 2026
Qwen3.6 Flash
NEWFast, cost-effective multimodal model with 1M context, near-flagship quality, vision/video, and built-in tools.
1M
$0.25
$1.5
Jun 2026
DeepSeek V4 Flash (Alibaba)
NEWDeepSeek V4 Flash served via Alibaba Model Studio. 1M context, thinking.
1M
$0.2
$0.4
Jun 2026
[?] Qwen3.7 Max [preview]
NEWFlagship agent model with native extended thinking and 1M context. Text-only; strong at coding, productivity, and long-horizon autonomous tasks.
1M
$2.5
$7.5
Jun 2026
[?] Qwen3.7 Max [2026 05 17]
NEWFlagship agent model with native extended thinking and 1M context. Text-only; strong at coding, productivity, and long-horizon autonomous tasks.
1M
$2.5
$7.5
Jun 2026
Cohere: North Mini Code (free) ¡ đ
NEWNorth Mini Code is Cohere's first agentic coding model and the debut of its North family. A sparse mixture-of-experts model with 30B total parameters and 3B active, it is optimized...
256K
-
-
Jun 2026
OpenRouter: Fusion
NEWFusion turns your prompt into a small multi-model deliberation. A panel of expert models (see below) analyzes your prompt in parallel with web search and web fetch enabled, then a...
1M
-
-
Jun 2026
GLM-5.2 (1M)
NEWZ.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoning_effort control (high/max). 1M context, 128K output.
1M
$1.4
$4.4
Jun 2026
Claude Fable 5
NEWMost capable widely released model for the most demanding reasoning and long-horizon agentic work
1M
$10
$50
Jun 2026
Anthropic: Claude Fable Latest
NEWThis model always redirects to the latest model in the Claude Fable family.
1M
$10
$50
Jun 2026
Nex AGI: Nex-N2-Pro
NEWNex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...
262K
$0.25
$1
Jun 2026
NVIDIA: Nemotron 3.5 Content Safety (free) ¡ đ
NEWNVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...
128K
-
-
Jun 2026
NVIDIA: Nemotron 3 Ultra
NEWNVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
1M
$0.5
$2.2
Jun 2026
Qwen3.7 Plus
NEWMultimodal agent model with 1M context, native thinking, and vision/video understanding. Lower cost than Max.
1M
$0.4
$1.6
Jun 2026
Kimi K2.7 Code Highspeed
NEWHigh-speed code variant with ~180 tok/s output (up to 260 in short contexts). Native multimodal with always-on thinking. 256K context.
262K
$1.9
$8
Jun 2026
MiniMax: MiniMax M3
NEWMiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...
1M
$0.3
$1.2
May 2026
StepFun: Step 3.7 Flash
NEWStep 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...
256K
$0.2
$1.15
May 2026
Nano Banana Pro
NEWGemini 3 Pro Image (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
164K
$2
$12
May 2026
Nano Banana 2
NEWGemini 3.1 Flash Image. (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
131K
$0.5
$3
May 2026
Claude Opus 4.8
NEWMost capable Opus-tier model for complex reasoning and agentic coding
1M
$5
$25
May 2026
Anthropic: Claude Opus 4.8 (Fast)
NEWFast-mode variant of [Opus 4.8](/anthropic/claude-opus-4.8) - identical capabilities with higher output speed at 2x pricing relative to regular Opus 4.8. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
1M
$10
$50
May 2026
xAI: Grok Build 0.1
NEWGrok Build 0.1 is xAIâs fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...
256K
$1
$2
May 2026
Gemini 3.5 Flash
NEWGemini 3.5 Flash (Version: 3.5-flash-05-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$1.5
$9
May 2026
Antigravity Agent Preview (2026-05)
NEWPreview release of Antigravity Agent (05-2026) (Version: 0.1, Defaults: temperature=undefined, topP=undefined, topK=undefined, interfaces=[generateContent,countTokens])
197K
$1.5
$9
May 2026
Qwen3 Coder Plus
Agentic coding model with very long context. Tiered pricing by input length (up to 1M).
1M
$1
$5
May 2026
Perceptron: Perceptron Mk1
Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...
33K
$0.15
$1.5
May 2026
Anthropic: Claude Opus 4.7 (Fast)
Fast-mode variant of [Opus 4.7](/anthropic/claude-opus-4.7) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
1M
$30
$150
May 2026
Qwen3.6 27b
Alibaba model (not yet curated).
131K
$0.6
$3
May 2026
inclusionAI: Ring-2.6-1T
Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency. It is optimized for coding agents, tool...
262K
$0.08
$0.63
May 2026
Gemini 3.1 Flash-Lite
Gemini 3.1 Flash Lite (Version: 3.1-flash-lite-05-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$0.25
$1.5
May 2026
OpenAI: GPT Chat Latest
GPT Chat Latest
400K
$5
$30
May 2026
xAI: Grok 4.3
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...
1M
$1.25
$2.5
Apr 2026
mistral-medium-3.5
Official mistral-medium-latest Mistral AI model
262K
$1.5
$7.5
Apr 2026
IBM: Granite 4.1 8B
Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...
131K
$0.05
$0.1
Apr 2026
[?] Qwen3 VL Plus [2025 12 19]
Current vision-language model with strong visual reasoning and thinking. Tiered pricing by input length (up to 256K).
262K
$0.2
$1.6
Apr 2026
Poolside: Laguna XS.2 (free) ¡ đ
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai/), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...
262K
-
-
Apr 2026
Poolside: Laguna M.1
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 256K...
262K
$0.2
$0.4
Apr 2026
Owl Alpha ¡ đ
Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....
1M
-
-
Apr 2026
NVIDIA: Nemotron 3 Nano Omni (free) ¡ đ
NVIDIA Nemotron⢠3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
256K
-
-
Apr 2026
mistral-medium-latest
Official mistral-medium-latest Mistral AI model
262K
$1.5
$7.5
Apr 2026
Mistral Medium (latest)
Official mistral-medium-latest Mistral AI model
262K
$1.5
$7.5
Apr 2026
Qwen3.6 Max Preview
Alibaba model (not yet curated).
131K
$1.04
$6.24
Apr 2026
Qwen3.6 35b A3b
Alibaba model (not yet curated).
131K
$0.14
$1
Apr 2026
Qwen3.5 Plus 2026 02 15
Alibaba model (not yet curated).
131K
$0.3
$1.8
Apr 2026
OpenAI GPT Mini Latest
This model always redirects to the latest model in the OpenAI GPT Mini family.
400K
$0.75
$4.5
Apr 2026
OpenAI GPT Latest
This model always redirects to the latest model in the OpenAI GPT family.
1.1M
$5
$30
Apr 2026
MoonshotAI Kimi Latest
This model always redirects to the latest model in the MoonshotAI Kimi family.
262K
$0.55
$3.2
Apr 2026
Google Gemini Pro Latest
This model always redirects to the latest model in the Google Gemini Pro family.
1M
$2
$12
Apr 2026
Google Gemini Flash Latest
This model always redirects to the latest model in the Google Gemini Flash family.
1M
$1.5
$9
Apr 2026
Anthropic Claude Sonnet Latest
This model always redirects to the latest model in the Anthropic Claude Sonnet family.
1M
$2
$10
Apr 2026
Anthropic Claude Haiku Latest
This model always redirects to the latest model in the Anthropic Claude Haiku family.
200K
$1
$5
Apr 2026
inclusionAI: Ling-2.6-1T
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the companyâs trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a âfast...
262K
$0.08
$0.63
Apr 2026
GPT-5.5 Pro
Most capable model for complex tasks. Uses more compute for smarter, more precise responses on the hardest problems.
1.1M
$30
$180
Apr 2026
GPT-5.5
New baseline for complex production workflows. Stronger task execution, more precise tool use, more efficient reasoning with fewer tokens. 1M token context.
1.1M
$5
$30
Apr 2026
Xiaomi: MiMo-V2.5-Pro
MiMo-V2.5-Pro is Xiaomiâs flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....
1M
$0.44
$0.87
Apr 2026
Xiaomi: MiMo-V2.5
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...
1M
$0.11
$0.28
Apr 2026
Tencent: Hy3 preview
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
262K
$0.06
$0.21
Apr 2026
Pareto Code Router
The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) coding percentiles. Set min_coding_score between 0 and 1 on the [pareto-router plugin](https://openrouter.ai/docs/guides/routing/routers/pare...
2M
-
-
Apr 2026
OpenAI: GPT-5.4 Image 2
(https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...
272K
$8
$15
Apr 2026
inclusionAI: Ling-2.6-flash
Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....
262K
$0.01
$0.03
Apr 2026
Deep Research Preview (2026-04)
Preview release (April 21th, 2026) of Deep Research (Version: deepthink-exp-05-20, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
197K
$1.25
$10
Apr 2026
Deep Research Max Preview (2026-04)
Preview release (April 21st, 2026) of Deep Research Max (Version: deepthink-exp-05-20, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
197K
$1.25
$10
Apr 2026
Anthropic: Claude Opus Latest
This model always redirects to the latest model in the Claude Opus family.
1M
$5
$25
Apr 2026
Kimi K2.6
Native multimodal flagship (text, image, video inputs) with thinking and non-thinking modes. Stronger long-form coding, improved instruction compliance and self-correction. 256K context.
262K
$0.95
$4
Apr 2026
Claude Opus 4.7
Previous most capable model for complex reasoning and agentic coding
1M
$5
$25
Apr 2026
Gemini 3.1 Flash TTS Preview
Gemini 3.1 Flash TTS Preview (Version: 3.1-flash-tts-preview, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
25K
$1
-
Apr 2026
Gemini Robotics-ER 1.6 Preview
Gemini Robotics-ER 1.6 Preview (Version: 1.6-preview, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
197K
$1
$5
Apr 2026
GLM-5.1
Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with stronger coding and long-horizon task autonomy. 200K context, thinking mode.
205K
$1.4
$4.4
Apr 2026
Qwen3.6 Plus
Alibaba model (not yet curated).
131K
$0.33
$1.95
Apr 2026
Gemma 4 31B IT
Gemma 4 31B IT (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
295K
-
-
Apr 2026
Gemma 4 26B A4B IT
Gemma 4 26B A4B IT (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
295K
-
-
Apr 2026
GLM-5V Turbo
First multimodal GLM-5 model. Vision-based coding agent with image/video/file inputs. 200K context, 128K output, thinking mode.
205K
$1.2
$4
Apr 2026
Arcee AI: Trinity Large Thinking
Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7...
262K
$0.25
$0.8
Apr 2026
xAI: Grok 4.20 Multi-Agent
Grok 4.20 Multi-Agent is a variant of xAIâs Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...
2M
$1.25
$2.5
Mar 2026
xAI: Grok 4.20
Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering...
2M
$1.25
$2.5
Mar 2026
Google: Lyria 3 Pro Preview ¡ đ
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...
1M
-
-
Mar 2026
Google: Lyria 3 Clip Preview ¡ đ
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...
1M
-
-
Mar 2026
Kwaipilot: KAT-Coder-Pro V2
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKATâs KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions,...
256K
$0.3
$1.2
Mar 2026
Reka Edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...
16K
$0.1
$0.1
Mar 2026
MiniMax: MiniMax M2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...
205K
$0.18
$0.72
Mar 2026
GPT-5.4 Nano
Cheapest GPT-5.4-class model for simple high-volume tasks like classification and data extraction.
400K
$0.2
$1.25
Mar 2026
GPT-5.4 Mini
Strongest mini model for coding, computer use, and subagents. GPT-5.4-class intelligence at lower cost and latency.
400K
$0.75
$4.5
Mar 2026
mistral-small-latest
Mistral Small 4.
262K
$0.15
$0.6
Mar 2026
Mistral Small (2603)
Mistral Small 4.
262K
$0.15
$0.6
Mar 2026
Leanstral (2603)
A mid & post-trained version of mistral small 4 for Lean
197K
-
-
Mar 2026
GLM-5 Turbo
Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and long-chain execution. 200K context, thinking mode.
205K
$1.2
$4
Mar 2026
NVIDIA: Nemotron 3 Super (free) ¡ đ
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
1M
-
-
Mar 2026
Qwen: Qwen3.5-9B
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...
262K
$0.1
$0.15
Mar 2026
ByteDance Seed: Seed-2.0-Lite
Seed-2.0-Lite is a versatile, costâefficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...
262K
$0.25
$2
Mar 2026
GPT-5.4 Pro
Most capable model for complex tasks. Uses more compute for smarter, more precise responses on difficult problems.
1.1M
$30
$180
Mar 2026
GPT-5.4
Most capable and efficient frontier model for professional work. Native computer use, improved reasoning, coding, and agentic workflows with 1M token context.
1.1M
$2.5
$15
Mar 2026
Inception: Mercury 2
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...
128K
$0.25
$0.75
Mar 2026
OpenAI: GPT-5.3 Chat
GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...
128K
$1.75
$14
Mar 2026
GPT-5.3 Instant
deprecatedGPT-5.3 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.
128K
$1.75
$14
Mar 2026
Gemini 3.1 Flash-Lite Preview
Gemini 3.1 Flash Lite Preview (Version: 3.1-flash-lite-preview-03-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$0.25
$1.5
Mar 2026
Nano Banana 2 Preview
Gemini 3.1 Flash Image Preview. (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
131K
$0.5
$3
Feb 2026
ByteDance Seed: Seed-2.0-Mini
Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/mediu...
262K
$0.1
$0.4
Feb 2026
Qwen3.5 35b A3b
Alibaba model (not yet curated).
131K
$0.14
$1
Feb 2026
Qwen3.5 27b
Alibaba model (not yet curated).
131K
$0.2
$1.56
Feb 2026
Qwen3.5 122b A10b
Alibaba model (not yet curated).
131K
$0.26
$2.08
Feb 2026
Qwen: Qwen3.5-Flash
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...
1M
$0.07
$0.26
Feb 2026
LiquidAI: LFM2-24B-A2B
LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per...
128K
$0.03
$0.12
Feb 2026
GPT Audio 1.5
Best voice model for audio in, audio out with Chat Completions. Accepts audio inputs and outputs.
128K
$2.5
$10
Feb 2026
AionLabs: Aion-2.0
Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging....
131K
$0.8
$1.6
Feb 2026
Gemini 3.1 Pro Preview (Custom Tools)
Gemini 3.1 Pro Preview optimized for custom tool usage (Version: 3.1-pro-preview-01-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$2
$12
Feb 2026
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview (Version: 3.1-pro-preview-01-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$2
$12
Feb 2026
Claude Sonnet 4.6
Best combination of speed and intelligence for everyday tasks
1M
$3
$15
Feb 2026
Qwen3.5 397b A17b
Alibaba model (not yet curated).
131K
$0.39
$2.45
Feb 2026
Qwen: Qwen3.5 Plus 2026-02-15
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...
1M
$0.26
$1.56
Feb 2026
MiniMax: MiniMax M2.5
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
205K
$0.12
$0.48
Feb 2026
GLM-5
Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.
205K
$1
$3.2
Feb 2026
Qwen: Qwen3 Max Thinking
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...
262K
$0.78
$3.9
Feb 2026
GPT-5.3 Codex
Most capable agentic coding model. Combines frontier coding performance of GPT-5.2-Codex with reasoning and professional knowledge of GPT-5.2. ~25% faster.
400K
$1.75
$14
Feb 2026
Claude Opus 4.6
Previous most intelligent model for complex agents and coding, with adaptive thinking
1M
$5
$25
Feb 2026
Qwen3 Coder Next
Alibaba model (not yet curated).
131K
$0.11
$0.8
Feb 2026
GLM-OCR (Vision, OCR)
Specialized OCR model for text extraction from images and documents.
131K
$0.03
$0.03
Feb 2026
Free Models Router ¡ đ
The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...
200K
-
-
Feb 2026
StepFun: Step 3.5 Flash
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....
262K
$0.1
$0.3
Jan 2026
Upstage: Solar Pro 3
Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized...
128K
$0.15
$0.6
Jan 2026
Kimi K2.5
Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.
262K
$0.6
$3
Jan 2026
MiniMax: MiniMax M2-her
MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message...
66K
$0.3
$1.2
Jan 2026
Writer: Palmyra X5
Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million...
1M
$0.6
$6
Jan 2026
LiquidAI: LFM2.5-1.2B-Thinking (free) ¡ đ
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAGâwhile still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...
33K
-
-
Jan 2026
LiquidAI: LFM2.5-1.2B-Instruct (free) ¡ đ
LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.
33K
-
-
Jan 2026
GLM-4.7 FlashX
Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.
131K
$0.07
$0.4
Jan 2026
GLM-4.7 Flash (Free)
Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.
131K
-
-
Jan 2026
Z.ai GLM 4.7 (Preview)
Z.ai GLM 4.7 (355B) on Cerebras (~1,000 tok/s). Strong agentic coding, advanced reasoning (on by default), superior tool use. 131K context, 40K max output.
131K
$2.25
$2.75
Jan 2026
MiniMax: MiniMax M2.1
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...
205K
$0.29
$0.95
Dec 2025
ByteDance Seed: Seed 1.6 Flash
Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of...
262K
$0.08
$0.3
Dec 2025
ByteDance Seed: Seed 1.6
Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.
262K
$0.25
$2
Dec 2025
GLM-4.7
Latest-gen GLM model with 128K context. Thinking mode activated by default.
131K
$0.6
$2.2
Dec 2025
Gemini 3 Flash Preview
Gemini 3 Flash Preview (Version: 3-flash-preview-12-2025, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$0.5
$3
Dec 2025
GPT Audio Mini
Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.
128K
$0.6
$2.4
Dec 2025
NVIDIA: Nemotron 3 Nano 30B A3B
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
262K
$0.05
$0.2
Dec 2025
OpenAI: GPT-5.2 Chat
GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively âthinkâ on...
128K
$1.75
$14
Dec 2025
GPT-5.2 Pro
Smartest and most trustworthy option for difficult questions. Uses more compute for harder thinking on complex domains like programming.
400K
$21
$168
Dec 2025
GPT-5.2 Instant
deprecatedGPT-5.2 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.
128K
$1.75
$14
Dec 2025
GPT-5.2 Codex
deprecatedGPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar environments. Supports low, medium, high, and xhigh reasoning effort settings.
400K
$1.75
$14
Dec 2025
GPT-5.2
Most capable model for professional work and long-running agents. Improvements in general intelligence, long-context, agentic tool-calling, and vision.
400K
$1.75
$14
Dec 2025
Deep Research Pro Preview
Preview release (December 12th, 2025) of Deep Research Pro (Version: deepthink-exp-05-20, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
197K
$1.25
$10
Dec 2025
AutoGLM Phone
Mobile phone automation agent. Understands phone screens via multimodal perception and executes automated operations.
131K
-
-
Dec 2025
Devstral 2 (latest)
Official devstral-2512 Mistral AI model
262K
$0.4
$2
Dec 2025
Devstral 2 (latest)
Official devstral-2512 Mistral AI model
262K
$0.4
$2
Dec 2025
Devstral 2 (latest)
Official mistral-medium-latest Mistral AI model
262K
$0.4
$2
Dec 2025
Devstral 2 (2512)
Official devstral-2512 Mistral AI model
262K
$0.4
$2
Dec 2025
Relace: Relace Search
The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic...
256K
$1
$3
Dec 2025
GLM-4.6 V FlashX
Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.
131K
$0.04
$0.4
Dec 2025
GLM-4.6 V Flash (Free)
Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.
131K
-
-
Dec 2025
GLM-4.6 V
Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.
131K
$0.3
$0.9
Dec 2025
Body Builder (beta)
Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example:...
128K
-
-
Dec 2025
mistral-large-latest
Official mistral-large-2512 Mistral AI model
262K
$0.5
$1.5
Dec 2025
Mistral Large (2512)
Official mistral-large-2512 Mistral AI model
262K
$0.5
$1.5
Dec 2025
ministral-8b-latest
Ministral 3 (a.k.a. Tinystral) 8B Instruct.
262K
$0.15
$0.15
Dec 2025
ministral-3b-latest
Ministral 3 (a.k.a. Tinystral) 3B Instruct.
131K
$0.1
$0.1
Dec 2025
ministral-14b-latest
Ministral 3 (a.k.a. Tinystral) 14B Instruct.
262K
$0.2
$0.2
Dec 2025
Ministral 8b (2512)
Ministral 3 (a.k.a. Tinystral) 8B Instruct.
262K
$0.15
$0.15
Dec 2025
Ministral 3b (2512)
Ministral 3 (a.k.a. Tinystral) 3B Instruct.
131K
$0.1
$0.1
Dec 2025
Ministral 14b (2512)
Ministral 3 (a.k.a. Tinystral) 14B Instruct.
262K
$0.2
$0.2
Dec 2025
Amazon: Nova 2 Lite
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing...
1M
$0.3
$2.5
Dec 2025
Arcee AI: Trinity Mini
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function...
131K
$0.05
$0.15
Dec 2025
Claude Opus 4.5
Previous most intelligent model with advanced reasoning for complex agentic workflows
200K
$5
$25
Nov 2025
AllenAI: Olmo 3 32B Think
Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and...
66K
$0.15
$0.5
Nov 2025
Nano Banana Pro Preview
Gemini 3 Pro Image Preview (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
164K
$2
$12
Nov 2025
Nano Banana Pro
Gemini 3 Pro Image Preview (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
164K
$2
$12
Nov 2025
GPT-5.1 Codex Max
deprecatedOur most intelligent coding model optimized for long-horizon, agentic coding tasks.
400K
$1.25
$10
Nov 2025
GPT-5.1 Codex Mini
deprecatedSmaller, faster version of GPT-5.1 Codex for efficient coding tasks.
400K
$0.25
$2
Nov 2025
GPT-5.1 Codex
deprecatedA version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.
400K
$1.25
$10
Nov 2025
GPT-5.1
The best model for coding and agentic tasks with configurable reasoning effort.
400K
$1.25
$10
Nov 2025
Deep Cogito: Cogito v2.1 671B
Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning...
128K
$1.25
$1.25
Nov 2025
OpenAI: GPT-5.1 Chat
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively âthinkâ on...
128K
$1.25
$10
Nov 2025
GPT-5.1 Instant
deprecatedGPT-5.1 Instant with adaptive reasoning. More conversational with improved instruction following.
128K
$1.25
$10
Nov 2025
MoonshotAI: Kimi K2 Thinking
Kimi K2 Thinking is Moonshot AIâs most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in...
262K
$0.6
$2.5
Nov 2025
Amazon: Nova Premier 1.0
Amazon Nova Premier is the most capable of Amazonâs multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
1M
$2.5
$12.5
Oct 2025
Perplexity: Sonar Pro Search
Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based...
200K
$3
$15
Oct 2025
Mistral: Voxtral Small 24B 2507
Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio...
32K
$0.1
$0.3
Oct 2025
[OpenAI] GPT OSS Safeguard 20B (Preview)
OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Groq.
131K
$0.08
$0.3
Oct 2025
NVIDIA: Nemotron Nano 12B 2 VL (free) ¡ đ
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mambaâs...
128K
-
-
Oct 2025
Qwen: Qwen3 VL 32B Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
262K
$0.1
$0.42
Oct 2025
MiniMax: MiniMax M2
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...
205K
$0.26
$1
Oct 2025
IBM: Granite 4.0 Micro
Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long...
131K
$0.02
$0.11
Oct 2025
Microsoft: Phi 4 Mini Instruct
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4...
131K
$0.08
$0.35
Oct 2025
Claude Haiku 4.5
Fastest model with exceptional speed and performance
200K
$1
$5
Oct 2025
Qwen: Qwen3 VL 8B Thinking
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...
256K
$0.12
$1.37
Oct 2025
Qwen: Qwen3 VL 8B Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
256K
$0.08
$0.5
Oct 2025
GPT-5 Search API
Updated web search model in Chat Completions API. 60% cheaper with domain filtering support.
400K
$1.25
$10
Oct 2025
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Metaâs Llama-3.3-70B-Instruct with a 128K context. Itâs post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...
131K
$0.4
$0.4
Oct 2025
Gemini 2.5 Computer Use Preview 10-2025
Gemini 2.5 Computer Use Preview 10-2025 (Version: Gemini 2.5 Computer Use Preview 10-2025, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
197K
$1.25
$10
Oct 2025
Qwen: Qwen3 VL 30B A3B Thinking
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...
131K
$0.13
$1.56
Oct 2025
Qwen: Qwen3 VL 30B A3B Instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
262K
$0.13
$0.52
Oct 2025
GPT-5 Pro
Version of GPT-5 that uses more compute to produce smarter and more precise responses. Designed for tough problems.
400K
$15
$120
Oct 2025
Nano Banana
Gemini 2.5 Flash Preview Image (Version: 2.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
66K
$0.3
$2.5
Oct 2025
GLM-4.6
GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.
131K
$0.6
$2.2
Sep 2025
DeepSeek: DeepSeek V3.2 Exp
DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism...
164K
$0.27
$0.41
Sep 2025
Claude Sonnet 4.5
Previous best combination of speed and intelligence for complex agents and coding
200K
$3
$15
Sep 2025
TheDrummer: Cydonia 24B V4.1
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
131K
$0.3
$0.5
Sep 2025
Relace: Relace Apply 3
Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at...
256K
$0.85
$1.25
Sep 2025
Google: Gemini 2.5 Flash Lite Preview 09-2025
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...
1M
$0.1
$0.4
Sep 2025
Qwen3 Vl 235b A22b Thinking
Alibaba model (not yet curated).
131K
$0.26
$2.6
Sep 2025
Qwen3 Vl 235b A22b Instruct
Alibaba model (not yet curated).
131K
$0.2
$0.88
Sep 2025
Qwen3 Max
Alibaba model (not yet curated).
131K
$0.78
$3.9
Sep 2025
DeepSeek: DeepSeek V3.1 Terminus
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...
164K
$0.27
$0.95
Sep 2025
Qwen3 Coder Flash
Alibaba model (not yet curated).
131K
$0.2
$0.98
Sep 2025
magistral-small-latest
Mistral Small 4.
262K
$0.5
$1.5
Sep 2025
magistral-medium-latest
Our frontier-class reasoning model release candidate September 2025.
131K
$2
$5
Sep 2025
Magistral Small (2509)
Our efficient reasoning model released September 2025.
131K
$0.5
$1.5
Sep 2025
Magistral Medium (2509)
Our frontier-class reasoning model release candidate September 2025.
131K
$2
$5
Sep 2025
GPT-5 Codex
deprecatedA version of GPT-5 optimized for agentic coding in Codex.
400K
$1.25
$10
Sep 2025
Qwen3 Next 80b A3b Thinking
Alibaba model (not yet curated).
131K
$0.1
$0.78
Sep 2025
Qwen3 Next 80b A3b Instruct
Alibaba model (not yet curated).
131K
$0.09
$1.1
Sep 2025
Qwen Plus
Balanced quality, speed, and cost with hybrid thinking. 1M context.
1M
$0.4
$1.2
Sep 2025
NVIDIA: Nemotron Nano 9B V2 (free) ¡ đ
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
128K
-
-
Sep 2025
MoonshotAI: Kimi K2 0905
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...
262K
$0.6
$2.5
Sep 2025
[Groq] Compound Mini (Agentic System)
Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.
131K
-
-
Sep 2025
[Groq] Compound (Agentic System)
Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.
131K
-
-
Sep 2025
Qwen3 30b A3b Thinking 2507
Alibaba model (not yet curated).
131K
$0.08
$0.4
Aug 2025
GPT Audio
First generally available audio model. Accepts audio inputs and outputs, and can be used in the Chat Completions REST API.
128K
$2.5
$10
Aug 2025
Nous: Hermes 4 70B
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...
131K
$0.13
$0.4
Aug 2025
Nous: Hermes 4 405B
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...
131K
$1
$3
Aug 2025
DeepSeek: DeepSeek V3.1
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...
164K
$0.21
$0.79
Aug 2025
Mistral: Mistral Medium 3.1
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...
131K
$0.4
$2
Aug 2025
Mistral Medium (2508)
Update on Mistral Medium 3 with improved capabilities.
131K
$0.4
$2
Aug 2025
GLM-4.5 V
Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.
98K
$0.6
$1.8
Aug 2025
AI21: Jamba Large 1.7
Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context...
256K
$2
$8
Aug 2025
OpenAI: GPT-5 Chat
GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.
128K
$1.25
$10
Aug 2025
GPT-5 Nano
Fastest, most cost-efficient version of GPT-5 for summarization and classification tasks.
400K
$0.05
$0.4
Aug 2025
GPT-5 Mini
A faster, more cost-efficient version of GPT-5 for well-defined tasks.
400K
$0.25
$2
Aug 2025
GPT-5 ChatGPT
deprecatedGPT-5 model used in ChatGPT.
128K
$1.25
$10
Aug 2025
GPT-5
The best model for coding and agentic tasks across domains.
400K
$1.25
$10
Aug 2025
GPT OSS 120B
OpenAI flagship open-weight MoE (120B total, 5.1B active) on Cerebras (~3,000 tok/s). Reasoning (default medium effort) and function calling. 131K context, 40K max output.
131K
$0.35
$0.75
Aug 2025
Claude Opus 4.1
deprecatedPrevious Opus model. Deprecated June 5, 2026, retiring August 5, 2026.
200K
$15
$75
Aug 2025
[OpenAI] GPT OSS 20B
OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.
131K
$0.08
$0.3
Aug 2025
Qwen: Qwen3 Coder 30B A3B Instruct
Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...
160K
$0.07
$0.27
Jul 2025
codestral-latest
Our cutting-edge language model for coding released August 2025.
256K
$0.3
$0.9
Jul 2025
Codestral (2508)
Our cutting-edge language model for coding released August 2025.
256K
$0.3
$0.9
Jul 2025
Qwen3 30b A3b Instruct 2507
Alibaba model (not yet curated).
131K
$0.05
$0.19
Jul 2025
GLM-4.5 X
Extended GLM-4.5 model. Interleaved thinking.
98K
$2.2
$8.9
Jul 2025
GLM-4.5 Flash (Free)
Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.
98K
-
-
Jul 2025
GLM-4.5 AirX
Extended lightweight GLM-4.5 variant. Interleaved thinking.
98K
$1.1
$4.5
Jul 2025
GLM-4.5 Air
Lightweight GLM-4.5 variant. Interleaved thinking.
98K
$0.2
$1.1
Jul 2025
GLM-4.5
Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.
98K
$0.6
$2.2
Jul 2025
Qwen3 235b A22b Thinking 2507
Alibaba model (not yet curated).
131K
$0.15
$1.5
Jul 2025
Qwen: Qwen3 Coder 480B A35B
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
1M
$0.22
$1.8
Jul 2025
Gemini 2.5 Flash-Lite
Stable version of Gemini 2.5 Flash-Lite, released in July of 2025 (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$0.1
$0.4
Jul 2025
ByteDance: UI-TARS 7B
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
128K
$0.1
$0.2
Jul 2025
Qwen: Qwen3 235B A22B Instruct 2507
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...
262K
$0.09
$0.1
Jul 2025
voxtral-small-latest
A small audio understanding model released in July 2025
33K
$0.1
$0.3
Jul 2025
voxtral-mini-latest
A mini audio understanding model released in July 2025
33K
$0.04
$0.04
-
Jul 2025
Voxtral Small (2507)
A small audio understanding model released in July 2025
33K
$0.1
$0.3
Jul 2025
Voxtral Mini (2507)
A mini audio understanding model released in July 2025
33K
$0.04
$0.04
-
Jul 2025
Switchpoint Router
Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you...
131K
$0.85
$3.4
Jul 2025
MoonshotAI: Kimi K2 0711
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for...
131K
$0.57
$2.3
Jul 2025
Venice: Uncensored (free) ¡ đ
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an âuncensoredâ instruct-tuned LLM, preserving...
33K
-
-
Jul 2025
Tencent: Hunyuan A13B Instruct
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark...
131K
$0.14
$0.57
Jul 2025
Morph: Morph V3 Large
Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code>...
262K
$0.9
$1.9
Jul 2025
Morph: Morph V3 Fast
Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update>...
82K
$0.8
$1.2
Jul 2025
Baidu: ERNIE 4.5 VL 424B A47B
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baiduâs ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...
131K
$0.42
$1.25
Jun 2025
o4 Mini Deep Research
deprecatedFaster, more affordable deep research model for complex, multi-step research tasks.
200K
$2
$8
Jun 2025
o3 Deep Research
deprecatedOur most powerful deep research model for complex, multi-step research tasks.
200K
$10
$40
Jun 2025
Mistral: Mistral Small 3.2 24B
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...
128K
$0.08
$0.2
Jun 2025
Mistral Small (2506)
Our latest enterprise-grade small model with the latest version released June 2025.
131K
$0.1
$0.3
Jun 2025
MiniMax: MiniMax M1
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it...
1M
$0.4
$2.2
Jun 2025
Gemini 2.5 Pro
Stable release (June 17th, 2025) of Gemini 2.5 Pro (Version: 2.5, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$1.25
$10
Jun 2025
Gemini 2.5 Flash
Stable version of Gemini 2.5 Flash, our mid-size multimodal model that supports up to 1 million tokens, released in June of 2025. (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$0.3
$2.5
Jun 2025
o3 Pro
Version of o3 with more compute for better responses. Provides consistently better answers for complex tasks.
200K
$20
$80
Jun 2025
DeepSeek: R1 0528
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...
164K
$0.5
$2.15
May 2025
Anthropic: Claude Sonnet 4
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...
1M
$3
$15
May 2025
Anthropic: Claude Opus 4
Claude Opus 4 is benchmarked as the worldâs best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...
200K
$15
$75
May 2025
Google: Gemma 3n 4B
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputsâincluding text, visual data, and audioâenabling diverse tasks...
33K
$0.06
$0.12
May 2025
Google: Gemini 2.5 Pro Preview 06-05
Gemini 2.5 Pro is Googleâs state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs âthinkingâ capabilities, enabling it to reason through responses with enhanced accuracy...
1M
$1.25
$10
May 2025
Gemini 2.5 Pro Preview TTS
Gemini 2.5 Pro Preview TTS (Version: gemini-2.5-pro-preview-tts-2025-05-19, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[countTokens,generateContent,batchGenerateContent])
25K
$1
-
May 2025
Gemini 2.5 Flash Preview TTS
Gemini 2.5 Flash Preview TTS (Version: gemini-2.5-flash-exp-tts-2025-05-19, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[countTokens,generateContent])
25K
$0.5
-
May 2025
mistral-medium-3
Official mistral-medium-latest Mistral AI model
262K
$0.4
$2
May 2025
Mistral Medium (2505)
Our frontier-class multimodal model released May 2025.
131K
$0.4
$2
May 2025
Google: Gemini 2.5 Pro Preview 05-06
Gemini 2.5 Pro is Googleâs state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs âthinkingâ capabilities, enabling it to reason through responses with enhanced accuracy...
1M
$1.25
$10
May 2025
Arcee AI: Virtuoso Large
VirtuosoâLarge is Arcee's topâtier generalâpurpose LLM at 72 B parameters, tuned to tackle crossâdomain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k...
131K
$0.75
$1.2
May 2025
Arcee AI: Coder Large
CoderâLarge is a 32 Bâparameter offspring of Qwen 2.5âInstruct that has been further trained on permissivelyâlicensed GitHub, CodeSearchNet and synthetic bugâfix corpora. It supports a 32k context window, enabling multiâfile...
33K
$0.5
$0.8
May 2025
Meta: Llama Guard 4 12B
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...
164K
$0.18
$0.18
Apr 2025
Qwen3 8b
Alibaba model (not yet curated).
131K
$0.05
$0.4
Apr 2025
Qwen3 32b
Alibaba model (not yet curated).
131K
$0.29
$0.59
Apr 2025
Qwen3 30b A3b
Alibaba model (not yet curated).
131K
$0.12
$0.5
Apr 2025
Qwen3 235b A22b
Alibaba model (not yet curated).
131K
$0.46
$1.82
Apr 2025
Qwen3 14b
Alibaba model (not yet curated).
131K
$0.1
$0.24
Apr 2025
o4 Mini
deprecatedLatest o4-mini model. Optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks.
200K
$1.1
$4.4
Apr 2025
o3
A well-rounded and powerful model across domains. Sets a new standard for math, science, coding, and visual reasoning tasks.
200K
$2
$8
Apr 2025
GPT-4.1 Nano
Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance with low latency, ideal for tasks like classification or autocompletion.
1M
$0.1
$0.4
Apr 2025
GPT-4.1 Mini
Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency by nearly half and cost by 83%.
1M
$0.4
$1.6
Apr 2025
GPT-4.1
Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.
1M
$2
$8
Apr 2025
GLM-4 32B (0414) 128K
GLM-4 32B model with 128K context, 16K output.
131K
$0.1
$0.1
Apr 2025
Meta: Llama 4 Scout
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...
10M
$0.1
$0.3
Apr 2025
Meta: Llama 4 Maverick
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
1M
$0.15
$0.6
Apr 2025
[Meta] Llama 4 Scout ¡ 17B à 16E (Preview)
Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.
131K
$0.11
$0.34
Apr 2025
DeepSeek: DeepSeek V3 0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well...
164K
$0.2
$0.77
Mar 2025
o1 Pro
A version of o1 with more compute for better responses. Provides consistently better answers for complex tasks.
200K
$150
$600
Mar 2025
Mistral: Mistral Small 3.1 24B
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...
128K
$0.35
$0.56
Mar 2025
Google: Gemma 3 4B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
131K
$0.05
$0.1
Mar 2025
Google: Gemma 3 12B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
131K
$0.05
$0.15
Mar 2025
Cohere: Command A
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...
256K
$2.5
$10
Mar 2025
Reka Flash 3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a...
66K
$0.1
$0.2
Mar 2025
Google: Gemma 3 27B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
131K
$0.08
$0.16
Mar 2025
GPT-4o Search Preview
Latest snapshot of the GPT-4o model optimized for web search capabilities.
128K
$2.5
$10
Mar 2025
GPT-4o Mini Search Preview
deprecatedLatest snapshot of the GPT-4o Mini model optimized for web search capabilities.
128K
$0.15
$0.6
Mar 2025
TheDrummer: Skyfall 36B V2
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.
33K
$0.55
$0.8
Mar 2025
Perplexity: Sonar Reasoning Pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thou...
128K
$2
$8
Mar 2025
Perplexity: Sonar Pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth,...
200K
$3
$15
Mar 2025
Perplexity: Sonar Deep Research
Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers...
128K
$2
$8
Mar 2025
Mistral: Saba
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional...
33K
$0.2
$0.6
Feb 2025
Gemini 2.0 Flash 001
Stable version of Gemini 2.0 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in January of 2025. (Version: 2.0, Defaults: temperature=1, topP=0.95, topK=40, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
1.1M
$0.1
$0.4
Feb 2025
AionLabs: Aion-RP 1.0 (8B)
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each otherâs responses. It is a fine-tuned base model...
33K
$0.8
$1.6
Feb 2025
AionLabs: Aion-1.0-Mini
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant...
131K
$0.7
$1.4
Feb 2025
AionLabs: Aion-1.0
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree...
131K
$4
$8
Feb 2025
Qwen: Qwen2.5 VL 72B Instruct
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
131K
$0.8
$1
Feb 2025
o3 Mini
Latest o3-mini model snapshot. High intelligence at the same cost and latency targets of o1-mini. Excels at science, math, and coding tasks.
200K
$1.1
$4.4
Jan 2025
Mistral: Mistral Small 3
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...
33K
$0.05
$0.08
Jan 2025
Perplexity: Sonar
Sonar is lightweight, affordable, fast, and simple to use â now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...
127K
$1
$1
Jan 2025
DeepSeek: R1 Distill Llama 70B
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance acr...
128K
$0.8
$0.8
Jan 2025
DeepSeek: R1
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....
164K
$0.7
$2.5
Jan 2025
V1 8K Vision (Preview)
Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision for production.
8K
$0.2
$2
Jan 2025
V1 32K Vision (Preview)
Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision for production.
33K
$1
$3
Jan 2025
V1 128K Vision (Preview)
Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision for production.
131K
$2
$5
Jan 2025
MiniMax: MiniMax-01
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...
1M
$0.2
$1.1
Jan 2025
Microsoft: Phi 4
(/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...
16K
$0.07
$0.14
Jan 2025
Sao10K: Llama 3.1 70B Hanami x1
This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).
16K
$3
$3
Jan 2025
DeepSeek: DeepSeek V3
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...
131K
$0.2
$0.8
Dec 2024
Sao10K: Llama 3.3 Euryale 70B
Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).
131K
$0.65
$0.75
Dec 2024
o1
Previous full o-series reasoning model.
200K
$15
$60
Dec 2024
Cohere: Command R7B (12-2024)
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning...
128K
$0.04
$0.15
Dec 2024
Meta: Llama 3.3 70B Instruct (free) ¡ đ
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
131K
-
-
Dec 2024
[Meta] Llama 3.3 ¡ 70B Versatile
Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.
131K
$0.59
$0.79
Dec 2024
Amazon: Nova Pro 1.0
Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December...
300K
$0.8
$3.2
Dec 2024
Amazon: Nova Micro 1.0
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length...
128K
$0.04
$0.14
Dec 2024
Amazon: Nova Lite 1.0
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...
300K
$0.06
$0.24
Dec 2024
Mistral Large 2407
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....
131K
$2
$6
Nov 2024
Qwen2.5 Coder 32B Instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...
128K
$0.66
$1
Nov 2024
TheDrummer: UnslopNemo 12B
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
33K
$0.4
$0.4
Nov 2024
Magnum v4 72B
This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://o...
33K
$3
$5
Oct 2024
Qwen: Qwen2.5 7B Instruct
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
131K
$0.04
$0.1
Oct 2024
Inflection: Inflection 3 Productivity
Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional...
8K
$2.5
$10
Oct 2024
Inflection: Inflection 3 Pi
Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...
8K
$2.5
$10
Oct 2024
TheDrummer: Rocinante 12B
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...
33K
$0.25
$0.5
Sep 2024
Meta: Llama 3.2 3B Instruct (free) ¡ đ
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...
131K
-
-
Sep 2024
Meta: Llama 3.2 1B Instruct
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...
131K
$0.03
$0.2
Sep 2024
Meta: Llama 3.2 11B Vision Instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
131K
$0.35
$0.35
Sep 2024
Qwen2.5 72B Instruct
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
131K
$0.36
$0.4
Sep 2024
Cohere: Command R+ (08-2024)
command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint...
128K
$2.5
$10
Aug 2024
Cohere: Command R (08-2024)
command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and...
128K
$0.15
$0.6
Aug 2024
Sao10K: Llama 3.1 Euryale 70B v2.2
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b).
131K
$0.85
$0.85
Aug 2024
Nous: Hermes 3 70B Instruct
Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements acr...
131K
$0.7
$0.7
Aug 2024
Nous: Hermes 3 405B Instruct (free) ¡ đ
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
131K
-
-
Aug 2024
Sao10K: Llama 3 8B Lunaris
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....
8K
$0.04
$0.05
Aug 2024
Meta: Llama 3.1 8B Instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...
131K
$0.02
$0.03
Jul 2024
Meta: Llama 3.1 70B Instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...
131K
$0.4
$0.4
Jul 2024
[Meta] Llama 3.1 ¡ 8B Instant
Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.
131K
$0.05
$0.08
Jul 2024
Mistral: Mistral Nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
131K
$0.02
$0.03
Jul 2024
open-mistral-nemo-2407
Our best multilingual open source model released July 2024.
131K
$0.15
$0.15
Jul 2024
open-mistral-nemo
Our best multilingual open source model released July 2024.
131K
$0.15
$0.15
Jul 2024
GPT-4o Mini
Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more capable than GPT-3.5 Turbo.
128K
$0.15
$0.6
Jul 2024
Google: Gemma 2 27B
Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...
8K
$0.65
$0.65
Jul 2024
GPT-4o
deprecatedOriginal gpt-4o snapshot from May 13, 2024.
128K
$5
$15
May 2024
Meta: Llama 3 8B Instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
8K
$0.14
$0.14
Apr 2024
Mistral: Mixtral 8x22B Instruct
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...
66K
$2
$6
Apr 2024
WizardLM-2 8x22B
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is...
66K
$0.62
$0.62
Apr 2024
GPT-4 Turbo
deprecatedGPT-4 Turbo with Vision model. Vision requests can now use JSON mode and function calling. gpt-4-turbo currently
128K
$10
$30
Apr 2024
Anthropic: Claude 3 Haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal
200K
$0.25
$1.25
Mar 2024
Mistral Large
This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....
128K
$2
$6
Feb 2024
V1 8K
Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.
8K
$0.2
$2
Feb 2024
V1 32K
Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.
33K
$1
$3
Feb 2024
V1 128K
Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.
131K
$2
$5
Feb 2024
OpenAI: GPT-4 Turbo Preview
The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023. **Note:** heavily rate limited by OpenAI while...
128K
$10
$30
Jan 2024
OpenAI: GPT-3.5 Turbo (older v0613)
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
4K
$1
$2
Jan 2024
3.5-Turbo
The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats.
16K
$0.5
$1.5
Jan 2024
3.5-Turbo
deprecatedThe latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats.
16K
$0.5
$1.5
Jan 2024
mistral-medium
Official mistral-medium-latest Mistral AI model
262K
$0.4
$2
Dec 2023
Auto Router
Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...
2M
-
-
Nov 2023
3.5-Turbo
deprecatedGPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.
16K
$1
$2
Nov 2023
OpenAI: GPT-3.5 Turbo Instruct
This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.
4K
$1.5
$2
Sep 2023
OpenAI: GPT-3.5 Turbo 16k
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up...
16K
$3
$4
Aug 2023
Mancer: Weaver (alpha)
An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
8K
$0.75
$1
Aug 2023
ReMM SLERP 13B
A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
6K
$0.45
$0.65
Jul 2023
MythoMax 13B
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
4K
$0.06
$0.06
Jul 2023
GPT-4
deprecatedSnapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.
8K
$30
$60
Jun 2023
GPT-4
Snapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.
8K
$30
$60
Jun 2023
[?] Qwen Plus [latest]
Balanced quality, speed, and cost with hybrid thinking. 1M context.
1M
$0.4
$1.2
-
Labs Leanstral 1 5 1
A mid & post-trained version of mistral small 4 for Lean (260618 SFT)
262K
-
-
-
labs-leanstral-1-5
A mid & post-trained version of mistral small 4 for Lean (260618 SFT)
262K
-
-
-
mistral-code-agent-latest
Official devstral-2512 Mistral AI model
262K
-
-
-
mistral-code-fim-latest
Our cutting-edge language model for coding released August 2025.
256K
-
-
-
mistral-code-latest
Our cutting-edge language model for coding released August 2025.
256K
-
-
-
mistral-tiny-latest
Our best multilingual open source model released July 2024.
131K
-
-
-
mistral-vibe-cli-fast
Mistral Small 4.
262K
-
-
-
mistral-vibe-cli-with-tools
Official mistral-medium-latest Mistral AI model
262K
-
-
-
Open Mistral Nemo
Our best multilingual open source model released July 2024.
131K
-
-
-
Qvq Max
Alibaba model (not yet curated).
131K
-
-
-
Qwen Coder Plus
Alibaba model (not yet curated).
131K
-
-
-
Qwen Flash
Fast and very low cost with hybrid thinking. 1M context.
1M
$0.05
$0.4
-
Qwen Max
Best quality of the stable commercial line. 32K context.
33K
$1.6
$6.4
-
Qwen Turbo
Fastest and cheapest for simple tasks. 1M context.
1M
$0.05
$0.2
-
Qwen Vl Max
Alibaba model (not yet curated).
131K
-
-
-
Qwen Vl Plus
Alibaba model (not yet curated).
131K
-
-
-
Qwen3 235b A22b Instruct 2507
Alibaba model (not yet curated).
131K
-
-
-
Qwen3 Coder 480b A35b Instruct
Alibaba model (not yet curated).
131K
-
-
-
Qwen3 Max Preview
Alibaba model (not yet curated).
131K
-
-
-
Qwen3 Vl Flash 2025 10 15
Alibaba model (not yet curated).
131K
-
-
-
Qwen3.5 Flash 2026 02 23
Alibaba model (not yet curated).
131K
-
-
-
Qwq Plus 2025 03 05
Alibaba model (not yet curated).
131K
-
-
-
Nano Banana 2 Lite
NEWGemini 3.1 Flash Lite Image. (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
Gemini Omni Flash Preview
NEWGemini Omni Flash Preview (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
Gemma 4 31B (Preview)
NEWGoogle Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inference (~1,850 tok/s). Vision (base64 PNG/JPEG, max 5 images / 10MB), function calling, reasoning (off by default, enable via effort). 131K context (65K free tier), 40K max output.
Claude Sonnet 5
NEWBest combination of speed and intelligence, with the largest gains in coding and agentic tasks
Kimi K2.7 Code (Alibaba)
NEWMoonshot Kimi K2.7 Code served via Alibaba Model Studio. Multimodal, always-on thinking, 256K context. (Alibaba pricing not yet published.)
DeepSeek V4 Pro (Alibaba)
NEWDeepSeek V4 Pro served via Alibaba Model Studio (Alibaba pricing, ~5x DeepSeek-direct). 1M context, thinking.
DeepSeek V3.2 (Alibaba)
NEWDeepSeek V3.2 served via Alibaba Model Studio (superseded by V4). Thinking.
Sakana Fugu Ultra
NEWMulti-agent conductor system routing 1-3 expert agents for complex, multi-step reasoning - maximum answer quality on hard tasks. 1M context.
Sakana Fugu
NEWFast orchestration model routing tasks across a swappable pool of frontier LLMs - low latency, high quality. 1M context. Billed at the routed underlying model's standard rate.
Qwen3.6 Flash
NEWFast, cost-effective multimodal model with 1M context, near-flagship quality, vision/video, and built-in tools.
DeepSeek V4 Flash (Alibaba)
NEWDeepSeek V4 Flash served via Alibaba Model Studio. 1M context, thinking.
[?] Qwen3.7 Max [preview]
NEWFlagship agent model with native extended thinking and 1M context. Text-only; strong at coding, productivity, and long-horizon autonomous tasks.
[?] Qwen3.7 Max [2026 05 17]
NEWFlagship agent model with native extended thinking and 1M context. Text-only; strong at coding, productivity, and long-horizon autonomous tasks.
Cohere: North Mini Code (free) ¡ đ
NEWNorth Mini Code is Cohere's first agentic coding model and the debut of its North family. A sparse mixture-of-experts model with 30B total parameters and 3B active, it is optimized...
OpenRouter: Fusion
NEWFusion turns your prompt into a small multi-model deliberation. A panel of expert models (see below) analyzes your prompt in parallel with web search and web fetch enabled, then a...
GLM-5.2 (1M)
NEWZ.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoning_effort control (high/max). 1M context, 128K output.
Claude Fable 5
NEWMost capable widely released model for the most demanding reasoning and long-horizon agentic work
Anthropic: Claude Fable Latest
NEWThis model always redirects to the latest model in the Claude Fable family.
Nex AGI: Nex-N2-Pro
NEWNex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...
NVIDIA: Nemotron 3.5 Content Safety (free) ¡ đ
NEWNVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...
NVIDIA: Nemotron 3 Ultra
NEWNVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
Qwen3.7 Plus
NEWMultimodal agent model with 1M context, native thinking, and vision/video understanding. Lower cost than Max.
Kimi K2.7 Code Highspeed
NEWHigh-speed code variant with ~180 tok/s output (up to 260 in short contexts). Native multimodal with always-on thinking. 256K context.
MiniMax: MiniMax M3
NEWMiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...
StepFun: Step 3.7 Flash
NEWStep 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...
Nano Banana Pro
NEWGemini 3 Pro Image (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
Nano Banana 2
NEWGemini 3.1 Flash Image. (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
Claude Opus 4.8
NEWMost capable Opus-tier model for complex reasoning and agentic coding
Anthropic: Claude Opus 4.8 (Fast)
NEWFast-mode variant of [Opus 4.8](/anthropic/claude-opus-4.8) - identical capabilities with higher output speed at 2x pricing relative to regular Opus 4.8. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
xAI: Grok Build 0.1
NEWGrok Build 0.1 is xAIâs fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...
Gemini 3.5 Flash
NEWGemini 3.5 Flash (Version: 3.5-flash-05-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
Antigravity Agent Preview (2026-05)
NEWPreview release of Antigravity Agent (05-2026) (Version: 0.1, Defaults: temperature=undefined, topP=undefined, topK=undefined, interfaces=[generateContent,countTokens])
Qwen3 Coder Plus
Agentic coding model with very long context. Tiered pricing by input length (up to 1M).
Perceptron: Perceptron Mk1
Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...
Anthropic: Claude Opus 4.7 (Fast)
Fast-mode variant of [Opus 4.7](/anthropic/claude-opus-4.7) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode
Qwen3.6 27b
Alibaba model (not yet curated).
inclusionAI: Ring-2.6-1T
Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency. It is optimized for coding agents, tool...
Gemini 3.1 Flash-Lite
Gemini 3.1 Flash Lite (Version: 3.1-flash-lite-05-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
OpenAI: GPT Chat Latest
GPT Chat Latest
xAI: Grok 4.3
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...
mistral-medium-3.5
Official mistral-medium-latest Mistral AI model
IBM: Granite 4.1 8B
Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...
[?] Qwen3 VL Plus [2025 12 19]
Current vision-language model with strong visual reasoning and thinking. Tiered pricing by input length (up to 256K).
Poolside: Laguna XS.2 (free) ¡ đ
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai/), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...
Poolside: Laguna M.1
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 256K...
Owl Alpha ¡ đ
Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....
NVIDIA: Nemotron 3 Nano Omni (free) ¡ đ
NVIDIA Nemotron⢠3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
mistral-medium-latest
Official mistral-medium-latest Mistral AI model
Mistral Medium (latest)
Official mistral-medium-latest Mistral AI model
Qwen3.6 Max Preview
Alibaba model (not yet curated).
Qwen3.6 35b A3b
Alibaba model (not yet curated).
Qwen3.5 Plus 2026 02 15
Alibaba model (not yet curated).
OpenAI GPT Mini Latest
This model always redirects to the latest model in the OpenAI GPT Mini family.
OpenAI GPT Latest
This model always redirects to the latest model in the OpenAI GPT family.
MoonshotAI Kimi Latest
This model always redirects to the latest model in the MoonshotAI Kimi family.
Google Gemini Pro Latest
This model always redirects to the latest model in the Google Gemini Pro family.
Google Gemini Flash Latest
This model always redirects to the latest model in the Google Gemini Flash family.
Anthropic Claude Sonnet Latest
This model always redirects to the latest model in the Anthropic Claude Sonnet family.
Anthropic Claude Haiku Latest
This model always redirects to the latest model in the Anthropic Claude Haiku family.
inclusionAI: Ling-2.6-1T
Ling-2.6-1T is an instant (instruct) model from inclusionAI and the companyâs trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a âfast...
GPT-5.5 Pro
Most capable model for complex tasks. Uses more compute for smarter, more precise responses on the hardest problems.
GPT-5.5
New baseline for complex production workflows. Stronger task execution, more precise tool use, more efficient reasoning with fewer tokens. 1M token context.
Xiaomi: MiMo-V2.5-Pro
MiMo-V2.5-Pro is Xiaomiâs flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....
Xiaomi: MiMo-V2.5
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...
Tencent: Hy3 preview
Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...
Pareto Code Router
The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) coding percentiles. Set min_coding_score between 0 and 1 on the [pareto-router plugin](https://openrouter.ai/docs/guides/routing/routers/pare...
OpenAI: GPT-5.4 Image 2
(https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...
inclusionAI: Ling-2.6-flash
Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....
Deep Research Preview (2026-04)
Preview release (April 21th, 2026) of Deep Research (Version: deepthink-exp-05-20, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
Deep Research Max Preview (2026-04)
Preview release (April 21st, 2026) of Deep Research Max (Version: deepthink-exp-05-20, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
Anthropic: Claude Opus Latest
This model always redirects to the latest model in the Claude Opus family.
Kimi K2.6
Native multimodal flagship (text, image, video inputs) with thinking and non-thinking modes. Stronger long-form coding, improved instruction compliance and self-correction. 256K context.
Claude Opus 4.7
Previous most capable model for complex reasoning and agentic coding
Gemini 3.1 Flash TTS Preview
Gemini 3.1 Flash TTS Preview (Version: 3.1-flash-tts-preview, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
Gemini Robotics-ER 1.6 Preview
Gemini Robotics-ER 1.6 Preview (Version: 1.6-preview, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
GLM-5.1
Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with stronger coding and long-horizon task autonomy. 200K context, thinking mode.
Qwen3.6 Plus
Alibaba model (not yet curated).
Gemma 4 31B IT
Gemma 4 31B IT (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
Gemma 4 26B A4B IT
Gemma 4 26B A4B IT (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
GLM-5V Turbo
First multimodal GLM-5 model. Vision-based coding agent with image/video/file inputs. 200K context, 128K output, thinking mode.
Arcee AI: Trinity Large Thinking
Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7...
xAI: Grok 4.20 Multi-Agent
Grok 4.20 Multi-Agent is a variant of xAIâs Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...
xAI: Grok 4.20
Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering...
Google: Lyria 3 Pro Preview ¡ đ
Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...
Google: Lyria 3 Clip Preview ¡ đ
30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...
Kwaipilot: KAT-Coder-Pro V2
KAT-Coder-Pro V2 is the latest high-performance model in KwaiKATâs KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions,...
Reka Edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...
MiniMax: MiniMax M2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...
GPT-5.4 Nano
Cheapest GPT-5.4-class model for simple high-volume tasks like classification and data extraction.
GPT-5.4 Mini
Strongest mini model for coding, computer use, and subagents. GPT-5.4-class intelligence at lower cost and latency.
mistral-small-latest
Mistral Small 4.
Mistral Small (2603)
Mistral Small 4.
Leanstral (2603)
A mid & post-trained version of mistral small 4 for Lean
GLM-5 Turbo
Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and long-chain execution. 200K context, thinking mode.
NVIDIA: Nemotron 3 Super (free) ¡ đ
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
Qwen: Qwen3.5-9B
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...
ByteDance Seed: Seed-2.0-Lite
Seed-2.0-Lite is a versatile, costâefficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...
GPT-5.4 Pro
Most capable model for complex tasks. Uses more compute for smarter, more precise responses on difficult problems.
GPT-5.4
Most capable and efficient frontier model for professional work. Native computer use, improved reasoning, coding, and agentic workflows with 1M token context.
Inception: Mercury 2
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...
OpenAI: GPT-5.3 Chat
GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...
GPT-5.3 Instant
deprecatedGPT-5.3 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.
Gemini 3.1 Flash-Lite Preview
Gemini 3.1 Flash Lite Preview (Version: 3.1-flash-lite-preview-03-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
Nano Banana 2 Preview
Gemini 3.1 Flash Image Preview. (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
ByteDance Seed: Seed-2.0-Mini
Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/mediu...
Qwen3.5 35b A3b
Alibaba model (not yet curated).
Qwen3.5 27b
Alibaba model (not yet curated).
Qwen3.5 122b A10b
Alibaba model (not yet curated).
Qwen: Qwen3.5-Flash
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...
LiquidAI: LFM2-24B-A2B
LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per...
GPT Audio 1.5
Best voice model for audio in, audio out with Chat Completions. Accepts audio inputs and outputs.
AionLabs: Aion-2.0
Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging....
Gemini 3.1 Pro Preview (Custom Tools)
Gemini 3.1 Pro Preview optimized for custom tool usage (Version: 3.1-pro-preview-01-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview (Version: 3.1-pro-preview-01-2026, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
Claude Sonnet 4.6
Best combination of speed and intelligence for everyday tasks
Qwen3.5 397b A17b
Alibaba model (not yet curated).
Qwen: Qwen3.5 Plus 2026-02-15
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...
MiniMax: MiniMax M2.5
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
GLM-5
Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic Engineering with SOTA coding and agent capabilities. 200K context, thinking mode.
Qwen: Qwen3 Max Thinking
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...
GPT-5.3 Codex
Most capable agentic coding model. Combines frontier coding performance of GPT-5.2-Codex with reasoning and professional knowledge of GPT-5.2. ~25% faster.
Claude Opus 4.6
Previous most intelligent model for complex agents and coding, with adaptive thinking
Qwen3 Coder Next
Alibaba model (not yet curated).
GLM-OCR (Vision, OCR)
Specialized OCR model for text extraction from images and documents.
Free Models Router ¡ đ
The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...
StepFun: Step 3.5 Flash
Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....
Upstage: Solar Pro 3
Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized...
Kimi K2.5
Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.
MiniMax: MiniMax M2-her
MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message...
Writer: Palmyra X5
Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million...
LiquidAI: LFM2.5-1.2B-Thinking (free) ¡ đ
LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAGâwhile still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...
LiquidAI: LFM2.5-1.2B-Instruct (free) ¡ đ
LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.
GLM-4.7 FlashX
Fast GLM-4.7 variant with priority routing and higher concurrency. Same model as Flash, better infrastructure.
GLM-4.7 Flash (Free)
Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 concurrent request) and lower priority.
Z.ai GLM 4.7 (Preview)
Z.ai GLM 4.7 (355B) on Cerebras (~1,000 tok/s). Strong agentic coding, advanced reasoning (on by default), superior tool use. 131K context, 40K max output.
MiniMax: MiniMax M2.1
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...
ByteDance Seed: Seed 1.6 Flash
Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of...
ByteDance Seed: Seed 1.6
Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.
GLM-4.7
Latest-gen GLM model with 128K context. Thinking mode activated by default.
Gemini 3 Flash Preview
Gemini 3 Flash Preview (Version: 3-flash-preview-12-2025, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
GPT Audio Mini
Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completions REST API.
NVIDIA: Nemotron 3 Nano 30B A3B
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
OpenAI: GPT-5.2 Chat
GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively âthinkâ on...
GPT-5.2 Pro
Smartest and most trustworthy option for difficult questions. Uses more compute for harder thinking on complex domains like programming.
GPT-5.2 Instant
deprecatedGPT-5.2 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.
GPT-5.2 Codex
deprecatedGPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar environments. Supports low, medium, high, and xhigh reasoning effort settings.
GPT-5.2
Most capable model for professional work and long-running agents. Improvements in general intelligence, long-context, agentic tool-calling, and vision.
Deep Research Pro Preview
Preview release (December 12th, 2025) of Deep Research Pro (Version: deepthink-exp-05-20, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
AutoGLM Phone
Mobile phone automation agent. Understands phone screens via multimodal perception and executes automated operations.
Devstral 2 (latest)
Official devstral-2512 Mistral AI model
Devstral 2 (latest)
Official devstral-2512 Mistral AI model
Devstral 2 (latest)
Official mistral-medium-latest Mistral AI model
Devstral 2 (2512)
Official devstral-2512 Mistral AI model
Relace: Relace Search
The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to explore a codebase and return relevant files to the user request. In contrast to RAG, relace-search performs agentic...
GLM-4.6 V FlashX
Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/file inputs, 32K output.
GLM-4.6 V Flash (Free)
Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concurrent request). Image/video/file inputs, 32K output.
GLM-4.6 V
Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hybrid thinking.
Body Builder (beta)
Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example:...
mistral-large-latest
Official mistral-large-2512 Mistral AI model
Mistral Large (2512)
Official mistral-large-2512 Mistral AI model
ministral-8b-latest
Ministral 3 (a.k.a. Tinystral) 8B Instruct.
ministral-3b-latest
Ministral 3 (a.k.a. Tinystral) 3B Instruct.
ministral-14b-latest
Ministral 3 (a.k.a. Tinystral) 14B Instruct.
Ministral 8b (2512)
Ministral 3 (a.k.a. Tinystral) 8B Instruct.
Ministral 3b (2512)
Ministral 3 (a.k.a. Tinystral) 3B Instruct.
Ministral 14b (2512)
Ministral 3 (a.k.a. Tinystral) 14B Instruct.
Amazon: Nova 2 Lite
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing...
Arcee AI: Trinity Mini
Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function...
Claude Opus 4.5
Previous most intelligent model with advanced reasoning for complex agentic workflows
AllenAI: Olmo 3 32B Think
Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for deep reasoning, complex logic chains and advanced instruction-following scenarios. Its capacity enables strong performance on demanding evaluation tasks and...
Nano Banana Pro Preview
Gemini 3 Pro Image Preview (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
Nano Banana Pro
Gemini 3 Pro Image Preview (Version: 3.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
GPT-5.1 Codex Max
deprecatedOur most intelligent coding model optimized for long-horizon, agentic coding tasks.
GPT-5.1 Codex Mini
deprecatedSmaller, faster version of GPT-5.1 Codex for efficient coding tasks.
GPT-5.1 Codex
deprecatedA version of GPT-5.1 optimized for agentic coding tasks in Codex or similar environments.
GPT-5.1
The best model for coding and agentic tasks with configurable reasoning effort.
Deep Cogito: Cogito v2.1 671B
Cogito v2.1 671B MoE represents one of the strongest open models globally, matching performance of frontier closed and open models. This model is trained using self play with reinforcement learning...
OpenAI: GPT-5.1 Chat
GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively âthinkâ on...
GPT-5.1 Instant
deprecatedGPT-5.1 Instant with adaptive reasoning. More conversational with improved instruction following.
MoonshotAI: Kimi K2 Thinking
Kimi K2 Thinking is Moonshot AIâs most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in...
Amazon: Nova Premier 1.0
Amazon Nova Premier is the most capable of Amazonâs multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
Perplexity: Sonar Pro Search
Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based...
Mistral: Voxtral Small 24B 2507
Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio...
[OpenAI] GPT OSS Safeguard 20B (Preview)
OpenAI safety classification model (20B MoE). Purpose-built for content moderation with Harmony response format. 131K context, 65K max output. ~1000 t/s on Groq.
NVIDIA: Nemotron Nano 12B 2 VL (free) ¡ đ
NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mambaâs...
Qwen: Qwen3 VL 32B Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
MiniMax: MiniMax M2
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...
IBM: Granite 4.0 Micro
Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long...
Microsoft: Phi 4 Mini Instruct
Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4...
Claude Haiku 4.5
Fastest model with exceptional speed and performance
Qwen: Qwen3 VL 8B Thinking
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...
Qwen: Qwen3 VL 8B Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
GPT-5 Search API
Updated web search model in Chat Completions API. 60% cheaper with domain filtering support.
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Metaâs Llama-3.3-70B-Instruct with a 128K context. Itâs post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...
Gemini 2.5 Computer Use Preview 10-2025
Gemini 2.5 Computer Use Preview 10-2025 (Version: Gemini 2.5 Computer Use Preview 10-2025, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens])
Qwen: Qwen3 VL 30B A3B Thinking
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...
Qwen: Qwen3 VL 30B A3B Instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
GPT-5 Pro
Version of GPT-5 that uses more compute to produce smarter and more precise responses. Designed for tough problems.
Nano Banana
Gemini 2.5 Flash Preview Image (Version: 2.0, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,batchGenerateContent])
GLM-4.6
GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whether to engage deep reasoning.
DeepSeek: DeepSeek V3.2 Exp
DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism...
Claude Sonnet 4.5
Previous best combination of speed and intelligence for complex agents and coding
TheDrummer: Cydonia 24B V4.1
Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.
Relace: Relace Apply 3
Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at...
Google: Gemini 2.5 Flash Lite Preview 09-2025
Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...
Qwen3 Vl 235b A22b Thinking
Alibaba model (not yet curated).
Qwen3 Vl 235b A22b Instruct
Alibaba model (not yet curated).
Qwen3 Max
Alibaba model (not yet curated).
DeepSeek: DeepSeek V3.1 Terminus
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...
Qwen3 Coder Flash
Alibaba model (not yet curated).
magistral-small-latest
Mistral Small 4.
magistral-medium-latest
Our frontier-class reasoning model release candidate September 2025.
Magistral Small (2509)
Our efficient reasoning model released September 2025.
Magistral Medium (2509)
Our frontier-class reasoning model release candidate September 2025.
GPT-5 Codex
deprecatedA version of GPT-5 optimized for agentic coding in Codex.
Qwen3 Next 80b A3b Thinking
Alibaba model (not yet curated).
Qwen3 Next 80b A3b Instruct
Alibaba model (not yet curated).
Qwen Plus
Balanced quality, speed, and cost with hybrid thinking. 1M context.
NVIDIA: Nemotron Nano 9B V2 (free) ¡ đ
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...
MoonshotAI: Kimi K2 0905
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...
[Groq] Compound Mini (Agentic System)
Lighter Groq agentic AI with web search, code execution. Pricing based on underlying model usage.
[Groq] Compound (Agentic System)
Groq agentic AI with web search, code execution, browser automation. Uses GPT-OSS 120B, Llama 4 Scout, Llama 3.3 70B. Pricing based on underlying model usage.
Qwen3 30b A3b Thinking 2507
Alibaba model (not yet curated).
GPT Audio
First generally available audio model. Accepts audio inputs and outputs, and can be used in the Chat Completions REST API.
Nous: Hermes 4 70B
Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...
Nous: Hermes 4 405B
Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...
DeepSeek: DeepSeek V3.1
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...
Mistral: Mistral Medium 3.1
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances...
Mistral Medium (2508)
Update on Mistral Medium 3 with improved capabilities.
GLM-4.5 V
Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.
AI21: Jamba Large 1.7
Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context...
OpenAI: GPT-5 Chat
GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.
GPT-5 Nano
Fastest, most cost-efficient version of GPT-5 for summarization and classification tasks.
GPT-5 Mini
A faster, more cost-efficient version of GPT-5 for well-defined tasks.
GPT-5 ChatGPT
deprecatedGPT-5 model used in ChatGPT.
GPT-5
The best model for coding and agentic tasks across domains.
GPT OSS 120B
OpenAI flagship open-weight MoE (120B total, 5.1B active) on Cerebras (~3,000 tok/s). Reasoning (default medium effort) and function calling. 131K context, 40K max output.
Claude Opus 4.1
deprecatedPrevious Opus model. Deprecated June 5, 2026, retiring August 5, 2026.
[OpenAI] GPT OSS 20B
OpenAI efficient open-weight MoE (20B total, 3.6B active). Tool use, browser search, code execution. 131K context, 65K max output. ~1000 t/s on Groq.
Qwen: Qwen3 Coder 30B A3B Instruct
Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...
codestral-latest
Our cutting-edge language model for coding released August 2025.
Codestral (2508)
Our cutting-edge language model for coding released August 2025.
Qwen3 30b A3b Instruct 2507
Alibaba model (not yet curated).
GLM-4.5 X
Extended GLM-4.5 model. Interleaved thinking.
GLM-4.5 Flash (Free)
Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7 Flash.
GLM-4.5 AirX
Extended lightweight GLM-4.5 variant. Interleaved thinking.
GLM-4.5 Air
Lightweight GLM-4.5 variant. Interleaved thinking.
GLM-4.5
Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.
Qwen3 235b A22b Thinking 2507
Alibaba model (not yet curated).
Qwen: Qwen3 Coder 480B A35B
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
Gemini 2.5 Flash-Lite
Stable version of Gemini 2.5 Flash-Lite, released in July of 2025 (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
ByteDance: UI-TARS 7B
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...
Qwen: Qwen3 235B A22B Instruct 2507
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...
voxtral-small-latest
A small audio understanding model released in July 2025
voxtral-mini-latest
A mini audio understanding model released in July 2025
Voxtral Small (2507)
A small audio understanding model released in July 2025
Voxtral Mini (2507)
A mini audio understanding model released in July 2025
Switchpoint Router
Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you...
MoonshotAI: Kimi K2 0711
Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for...
Venice: Uncensored (free) ¡ đ
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an âuncensoredâ instruct-tuned LLM, preserving...
Tencent: Hunyuan A13B Instruct
Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark...
Morph: Morph V3 Large
Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code>...
Morph: Morph V3 Fast
Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update>...
Baidu: ERNIE 4.5 VL 424B A47B
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baiduâs ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...
o4 Mini Deep Research
deprecatedFaster, more affordable deep research model for complex, multi-step research tasks.
o3 Deep Research
deprecatedOur most powerful deep research model for complex, multi-step research tasks.
Mistral: Mistral Small 3.2 24B
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...
Mistral Small (2506)
Our latest enterprise-grade small model with the latest version released June 2025.
MiniMax: MiniMax M1
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it...
Gemini 2.5 Pro
Stable release (June 17th, 2025) of Gemini 2.5 Pro (Version: 2.5, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
Gemini 2.5 Flash
Stable version of Gemini 2.5 Flash, our mid-size multimodal model that supports up to 1 million tokens, released in June of 2025. (Version: 001, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
o3 Pro
Version of o3 with more compute for better responses. Provides consistently better answers for complex tasks.
DeepSeek: R1 0528
May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...
Anthropic: Claude Sonnet 4
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...
Anthropic: Claude Opus 4
Claude Opus 4 is benchmarked as the worldâs best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...
Google: Gemma 3n 4B
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputsâincluding text, visual data, and audioâenabling diverse tasks...
Google: Gemini 2.5 Pro Preview 06-05
Gemini 2.5 Pro is Googleâs state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs âthinkingâ capabilities, enabling it to reason through responses with enhanced accuracy...
Gemini 2.5 Pro Preview TTS
Gemini 2.5 Pro Preview TTS (Version: gemini-2.5-pro-preview-tts-2025-05-19, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[countTokens,generateContent,batchGenerateContent])
Gemini 2.5 Flash Preview TTS
Gemini 2.5 Flash Preview TTS (Version: gemini-2.5-flash-exp-tts-2025-05-19, Defaults: temperature=1, topP=0.95, topK=64, interfaces=[countTokens,generateContent])
mistral-medium-3
Official mistral-medium-latest Mistral AI model
Mistral Medium (2505)
Our frontier-class multimodal model released May 2025.
Google: Gemini 2.5 Pro Preview 05-06
Gemini 2.5 Pro is Googleâs state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs âthinkingâ capabilities, enabling it to reason through responses with enhanced accuracy...
Arcee AI: Virtuoso Large
VirtuosoâLarge is Arcee's topâtier generalâpurpose LLM at 72 B parameters, tuned to tackle crossâdomain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k...
Arcee AI: Coder Large
CoderâLarge is a 32 Bâparameter offspring of Qwen 2.5âInstruct that has been further trained on permissivelyâlicensed GitHub, CodeSearchNet and synthetic bugâfix corpora. It supports a 32k context window, enabling multiâfile...
Meta: Llama Guard 4 12B
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...
Qwen3 8b
Alibaba model (not yet curated).
Qwen3 32b
Alibaba model (not yet curated).
Qwen3 30b A3b
Alibaba model (not yet curated).
Qwen3 235b A22b
Alibaba model (not yet curated).
Qwen3 14b
Alibaba model (not yet curated).
o4 Mini
deprecatedLatest o4-mini model. Optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks.
o3
A well-rounded and powerful model across domains. Sets a new standard for math, science, coding, and visual reasoning tasks.
GPT-4.1 Nano
Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance with low latency, ideal for tasks like classification or autocompletion.
GPT-4.1 Mini
Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intelligence while reducing latency by nearly half and cost by 83%.
GPT-4.1
Flagship GPT model for complex tasks. Major improvements on coding, instruction following, and long context with 1M token context window.
GLM-4 32B (0414) 128K
GLM-4 32B model with 128K context, 16K output.
Meta: Llama 4 Scout
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...
Meta: Llama 4 Maverick
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
[Meta] Llama 4 Scout ¡ 17B à 16E (Preview)
Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal with vision support. 131K context, 8K max output. ~750 t/s on Groq.
DeepSeek: DeepSeek V3 0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well...
o1 Pro
A version of o1 with more compute for better responses. Provides consistently better answers for complex tasks.
Mistral: Mistral Small 3.1 24B
Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...
Google: Gemma 3 4B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Google: Gemma 3 12B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
Cohere: Command A
Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...
Reka Flash 3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a...
Google: Gemma 3 27B
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...
GPT-4o Search Preview
Latest snapshot of the GPT-4o model optimized for web search capabilities.
GPT-4o Mini Search Preview
deprecatedLatest snapshot of the GPT-4o Mini model optimized for web search capabilities.
TheDrummer: Skyfall 36B V2
Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fine-tuned for improved creativity, nuanced writing, role-playing, and coherent storytelling.
Perplexity: Sonar Reasoning Pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) Sonar Reasoning Pro is a premier reasoning model powered by DeepSeek R1 with Chain of Thou...
Perplexity: Sonar Pro
Note: Sonar Pro pricing includes Perplexity search pricing. See [details here](https://docs.perplexity.ai/guides/pricing#detailed-pricing-breakdown-for-sonar-reasoning-pro-and-sonar-pro) For enterprises seeking more advanced capabilities, the Sonar Pro API can handle in-depth,...
Perplexity: Sonar Deep Research
Sonar Deep Research is a research-focused model designed for multi-step retrieval, synthesis, and reasoning across complex topics. It autonomously searches, reads, and evaluates sources, refining its approach as it gathers...
Mistral: Saba
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional...
Gemini 2.0 Flash 001
Stable version of Gemini 2.0 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in January of 2025. (Version: 2.0, Defaults: temperature=1, topP=0.95, topK=40, interfaces=[generateContent,countTokens,createCachedContent,batchGenerateContent])
AionLabs: Aion-RP 1.0 (8B)
Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each otherâs responses. It is a fine-tuned base model...
AionLabs: Aion-1.0-Mini
Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant...
AionLabs: Aion-1.0
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree...
Qwen: Qwen2.5 VL 72B Instruct
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
o3 Mini
Latest o3-mini model snapshot. High intelligence at the same cost and latency targets of o1-mini. Excels at science, math, and coding tasks.
Mistral: Mistral Small 3
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...
Perplexity: Sonar
Sonar is lightweight, affordable, fast, and simple to use â now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features...
DeepSeek: R1 Distill Llama 70B
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance acr...
DeepSeek: R1
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....
V1 8K Vision (Preview)
Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision for production.
V1 32K Vision (Preview)
Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision for production.
V1 128K Vision (Preview)
Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision for production.
MiniMax: MiniMax-01
MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...
Microsoft: Phi 4
(/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...
Sao10K: Llama 3.1 70B Hanami x1
This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).
DeepSeek: DeepSeek V3
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...
Sao10K: Llama 3.3 Euryale 70B
Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).
o1
Previous full o-series reasoning model.
Cohere: Command R7B (12-2024)
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning...
Meta: Llama 3.3 70B Instruct (free) ¡ đ
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
[Meta] Llama 3.3 ¡ 70B Versatile
Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 131K context, 32K max output. ~280 t/s on Groq.
Amazon: Nova Pro 1.0
Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December...
Amazon: Nova Micro 1.0
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length...
Amazon: Nova Lite 1.0
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...
Mistral Large 2407
This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....
Qwen2.5 Coder 32B Instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...
TheDrummer: UnslopNemo 12B
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Magnum v4 72B
This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://o...
Qwen: Qwen2.5 7B Instruct
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
Inflection: Inflection 3 Productivity
Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional...
Inflection: Inflection 3 Pi
Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...
TheDrummer: Rocinante 12B
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...
Meta: Llama 3.2 3B Instruct (free) ¡ đ
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...
Meta: Llama 3.2 1B Instruct
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...
Meta: Llama 3.2 11B Vision Instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
Qwen2.5 72B Instruct
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
Cohere: Command R+ (08-2024)
command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint...
Cohere: Command R (08-2024)
command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and...
Sao10K: Llama 3.1 Euryale 70B v2.2
Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b).
Nous: Hermes 3 70B Instruct
Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements acr...
Nous: Hermes 3 405B Instruct (free) ¡ đ
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Sao10K: Llama 3 8B Lunaris
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....
Meta: Llama 3.1 8B Instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...
Meta: Llama 3.1 70B Instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...
[Meta] Llama 3.1 ¡ 8B Instant
Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K context and max output. ~560 t/s on Groq.
Mistral: Mistral Nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
open-mistral-nemo-2407
Our best multilingual open source model released July 2024.
open-mistral-nemo
Our best multilingual open source model released July 2024.
GPT-4o Mini
Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more capable than GPT-3.5 Turbo.
Google: Gemma 2 27B
Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...
GPT-4o
deprecatedOriginal gpt-4o snapshot from May 13, 2024.
Meta: Llama 3 8B Instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
Mistral: Mixtral 8x22B Instruct
Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...
WizardLM-2 8x22B
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is...
GPT-4 Turbo
deprecatedGPT-4 Turbo with Vision model. Vision requests can now use JSON mode and function calling. gpt-4-turbo currently
Anthropic: Claude 3 Haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal
Mistral Large
This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/)....
V1 8K
Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.
V1 32K
Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.
V1 128K
Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.
OpenAI: GPT-4 Turbo Preview
The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023. **Note:** heavily rate limited by OpenAI while...
OpenAI: GPT-3.5 Turbo (older v0613)
GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.
3.5-Turbo
The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats.
3.5-Turbo
deprecatedThe latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats.
mistral-medium
Official mistral-medium-latest Mistral AI model
Auto Router
Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output. To see which model was used,...
3.5-Turbo
deprecatedGPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.
OpenAI: GPT-3.5 Turbo Instruct
This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.
OpenAI: GPT-3.5 Turbo 16k
This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up...
Mancer: Weaver (alpha)
An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
ReMM SLERP 13B
A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
MythoMax 13B
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
GPT-4
deprecatedSnapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.
GPT-4
Snapshot of gpt-4 from June 13th 2023 with improved function calling support. Data up to Sep 2021.
[?] Qwen Plus [latest]
Balanced quality, speed, and cost with hybrid thinking. 1M context.
Labs Leanstral 1 5 1
A mid & post-trained version of mistral small 4 for Lean (260618 SFT)
labs-leanstral-1-5
A mid & post-trained version of mistral small 4 for Lean (260618 SFT)
mistral-code-agent-latest
Official devstral-2512 Mistral AI model
mistral-code-fim-latest
Our cutting-edge language model for coding released August 2025.
mistral-code-latest
Our cutting-edge language model for coding released August 2025.
mistral-tiny-latest
Our best multilingual open source model released July 2024.
mistral-vibe-cli-fast
Mistral Small 4.
mistral-vibe-cli-with-tools
Official mistral-medium-latest Mistral AI model
Open Mistral Nemo
Our best multilingual open source model released July 2024.
Qvq Max
Alibaba model (not yet curated).
Qwen Coder Plus
Alibaba model (not yet curated).
Qwen Flash
Fast and very low cost with hybrid thinking. 1M context.
Qwen Max
Best quality of the stable commercial line. 32K context.
Qwen Turbo
Fastest and cheapest for simple tasks. 1M context.
Qwen Vl Max
Alibaba model (not yet curated).
Qwen Vl Plus
Alibaba model (not yet curated).
Qwen3 235b A22b Instruct 2507
Alibaba model (not yet curated).
Qwen3 Coder 480b A35b Instruct
Alibaba model (not yet curated).
Qwen3 Max Preview
Alibaba model (not yet curated).
Qwen3 Vl Flash 2025 10 15
Alibaba model (not yet curated).
Qwen3.5 Flash 2026 02 23
Alibaba model (not yet curated).
Qwq Plus 2025 03 05
Alibaba model (not yet curated).
Connect your own keys, run models side by side, then compare and merge the answers. Keys and chats stay in your browser.
Launch Big-AGIBIG-AGI
Resources
Š 2026 Token Fabrics¡Built with passion in San Diego