BEAM

Features

Rankings

Pro

Docs

GitHub

LAUNCH APP

The Big-AGI Model Index

Supported AI models.

If it's on this list, it runs in Big-AGI. Capabilities, context windows, and pricing for every model. Your key, provider rates, no markup.

Alibaba

Anthropic

AWS Bedrock

Azure

Cerebras

DeepSeek

Fireworks AI

Google Gemini

Groq

MiniMax

Mistral

Moonshot

OpenAI

OpenRouter

Perplexity

Sakana AI

SpaceXAI

Together AI

Z.ai

663 models indexed·800+ supported·refreshed every 30 min

What this list is: the models Big-AGI indexes, with full specs for each. Big-AGI also connects to any OpenAI-compatible endpoint, every model on OpenRouter, and local runtimes like Ollama, so the models you can run go beyond this list.

Available on

Highlights

Capabilities

VendorsCapabilities

Meituan: LongCat 2.0

NEW

LongCat 2.0 is a sparse mixture-of-experts language model from Meituan, with 48…

$0.3

$1.2

Jul 2026

Thinking Machines: Inkling

NEW

Inkling is an open-weight multimodal mixture-of-experts model from Thinking Mac…

$4.05

Jul 2026

Auto Router (Beta)

NEW

Auto Router (Beta) is a task-aware router from OpenRouter. It classifies each r…

Jul 2026

Kimi K3

NEWHOT

Native multimodal flagship (text, image, video inputs) with thinking on by defa…

$15

Jul 2026

Meta: Muse Spark 1.1

NEW

Muse Spark 1.1 is a multimodal reasoning model from Meta, built for agentic tas…

$1.25

$4.25

Jul 2026

Kwaipilot: KAT-Coder-Pro V2.5

NEW

KAT-Coder-Pro V2.5 is a flagship-level Agentic Coding model that can directly h…

256K

$0.74

$2.96

Jul 2026

Kwaipilot: KAT-Coder-Air V2.5 (free) ·

NEW

KAT-Coder-Air V2.5 is a flagship-level Agentic Coding model that can directly h…

256K

Jul 2026

OpenAI: GPT-5.6 Terra Pro

NEW

GPT-5.6 Terra Pro is the same underlying model as GPT-5.6 Terra, served with `r…

1.1M

$2.5

$15

Jul 2026

OpenAI: GPT-5.6 Sol Pro

NEW

GPT-5.6 Sol Pro is the same underlying model as GPT-5.6 Sol, served with `reaso…

1.1M

$30

Jul 2026

OpenAI: GPT-5.6 Luna Pro

NEW

GPT-5.6 Luna Pro is the same underlying model as GPT-5.6 Luna, served with `rea…

1.1M

Jul 2026

xAI: Grok Latest

NEW

This model always redirects to the latest Grok model from xAI.

500K

Jul 2026

Grok 4.5

NEWHOT

xAI's smartest and fastest model with frontier performance on coding, knowledge…

500K

Jul 2026

AionLabs: Aion-3.0-Mini

NEW

Aion-3.0 Mini is a multi-model roleplaying and storytelling system from AionLab…

131K

$0.7

$1.4

Jul 2026

AionLabs: Aion-3.0

NEW

Aion-3.0 is a multi-model roleplaying and storytelling system from AionLabs, bu…

131K

Jul 2026

Tencent: Hy3

NEWHOT

Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192…

262K

$0.14

$0.58

Jul 2026

Poolside: Laguna XS 2.1 (free) ·

NEW

Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from Poo…

262K

Jul 2026

Nano Banana 2 Lite

NEW

Gemini 3.1 Flash Lite Image.

131K

$0.25

$1.5

Jun 2026

Gemma 4 31B (Preview)

NEW

Google Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inferenc…

131K

$0.99

$1.49

Jun 2026

Gemini Omni Flash Preview (video)

NEW

Gemini Omni Flash Preview

197K

$1.5

$17.5

Jun 2026

Qwen3.6 35B A3B Lora

NEW

Qwen chat model.

262K

Jun 2026

Qwen3.5 2B Lora

NEW

Qwen chat model.

262K

Jun 2026

Claude Sonnet 5 · US

NEW

Best combination of speed and intelligence, with the largest gains in coding an…

$10

Jun 2026

Claude Sonnet 5

NEWHOT

Best combination of speed and intelligence, with the largest gains in coding an…

$10

Jun 2026

GPT-5.6 Terra

NEWHOT

Balanced model for efficient, high-volume everyday work. Competitive with GPT-5…

1.1M

$2.5

$15

Jun 2026

GPT-5.6 Sol

NEWHOT

Flagship next-generation model. Strongest yet for agentic coding, science, and…

1.1M

$30

Jun 2026

GPT-5.6 Luna

NEWHOT

Fastest, most affordable GPT-5.6 model for high-volume work. Strong capability…

1.1M

Jun 2026

GLM-5.2 (Alibaba)

NEW

Zhipu GLM-5.2 served via Alibaba Model Studio. 1M context, thinking.

$1.1

$3.85

Jun 2026

Sakana: Fugu Ultra

NEW

Fugu Ultra is the higher-performance model in Sakana AI's Fugu family. Rather t…

$30

Jun 2026

Nex AGI: Nex-N2-Mini

NEW

Nex-N2-Mini is an open-source agentic mixture-of-experts model from Nex AGI, th…

262K

$0.03

$0.1

Jun 2026

Sakana Fugu Cyber

NEW

Orchestrator specialized for cybersecurity reasoning: security analysis, vulner…

$36

Jun 2026

Sakana Fugu

NEW

Fast orchestration model routing tasks across a swappable pool of frontier LLMs…

Jun 2026

Qwen3.7 Max

NEW

Flagship agent model with native extended thinking and 1M context. Text-only; s…

$2.5

$7.5

Jun 2026

Cohere: North Mini Code (free) ·

NEW

North Mini Code is Cohere's first agentic coding model and the debut of its Nor…

256K

Jun 2026

OpenRouter: Fusion

NEW

Fusion turns your prompt into a small multi-model deliberation. A panel of expe…

Jun 2026

GLM-5.2 (1M)

NEWHOT

Z.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoni…

$1.4

$4.4

Jun 2026

Claude Fable 5 · Global

NEW

Most capable widely released model for the most demanding reasoning and long-ho…

$10

$50

Jun 2026

Claude Fable 5

NEWHOT

Most capable widely released model for the most demanding reasoning and long-ho…

$10

$50

Jun 2026

Anthropic: Claude Fable Latest

NEW

This model always redirects to the latest model in the Claude Fable family.

$10

$50

Jun 2026

Nex AGI: Nex-N2-Pro

NEW

Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active…

262K

$0.25

Jun 2026

NVIDIA: Nemotron 3.5 Content Safety (free) ·

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardra…

128K

Jun 2026

NVIDIA: Nemotron 3 Ultra

HOT

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model f…

$0.6

$3.6

Jun 2026

Llama 4 Maverick 17B 128E Instruct Nvfp4

Meta chat model. https://huggingface.co/api/models/RedHatAI/Llama-4-Maverick-17…

Jun 2026

GLM 4.7 FP4

Zai Org chat model.

203K

Jun 2026

Qwen3.7 Plus

Multimodal agent model with 1M context, native thinking, and vision/video under…

$0.4

$1.6

Jun 2026

Kimi K2.7 Code Highspeed

High-speed code variant with ~180 tok/s output (up to 260 in short contexts). N…

262K

$1.9

Jun 2026

Kimi K2.7 Code

Code-focused multimodal model (text, image, video inputs) with always-on thinki…

262K

$0.95

Jun 2026

MiniMax: MiniMax M3

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, ima…

$0.3

$1.2

May 2026

StepFun: Step 3.7 Flash

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Expert…

256K

$0.2

$1.15

May 2026

Nano Banana Pro

Gemini 3 Pro Image

164K

$12

May 2026

Nano Banana 2

Gemini 3.1 Flash Image.

131K

$0.5

May 2026

Claude Opus 4.8 · US

Most capable Opus-tier model for complex reasoning and agentic coding (Bedrock…

$25

May 2026

Claude Opus 4.8

HOT

Most capable Opus-tier model for complex reasoning and agentic coding

$25

May 2026

Anthropic: Claude Opus 4.8 (Fast)

Fast-mode variant of Opus 4.8 - identical capabilities with higher output speed…

$10

$50

May 2026

Qwen3.7 Max

Flagship agent model with native extended thinking and 1M context. Text-only; s…

$2.5

$7.5

May 2026

Grok Build 0.1

xAI fast coding model with reasoning, function calling, and structured outputs.…

256K

May 2026

Llama 4 Scout 17B 16E Instruct Fp8 Lora

Meta chat model.

10M

May 2026

Gemini 3.5 Flash

HOT

Gemini 3.5 Flash

1.1M

$1.5

May 2026

Antigravity Agent Preview (2026-05)

Preview release of Antigravity Agent (05-2026)

197K

$1.5

May 2026

Gemma 4 31B It Lora

Google chat model. https://huggingface.co/api/models/google/gemma-4-31B-it

262K

May 2026

Gemma 3 27B It Lora

Google chat model.

May 2026

Perceptron: Perceptron Mk1

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model…

33K

$0.15

$1.5

May 2026

Anthropic: Claude Opus 4.7 (Fast)

Fast-mode variant of Opus 4.7 - identical capabilities with higher output speed…

$30

$150

May 2026

inclusionAI: Ring-2.6-1T

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters,…

262K

$0.08

$0.63

May 2026

Mixtral 8x7B Instruct V0.1 FP8 Lora

Mistral AI chat model.

33K

May 2026

Gemma 3 270M It Lora

Google chat model.

33K

May 2026

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash Lite

1.1M

$0.25

$1.5

May 2026

Llama 3.3 70B Instruct FP8 Lora

Meta chat model.

131K

May 2026

OpenAI: GPT Chat Latest

GPT Chat Latest

400K

$30

May 2026

Command A Plus

Cohere flagship. Agentic reasoning with vision, tool use, and long-context RAG.…

436K

May 2026

Qwen3 VL Plus

Current vision-language model with strong visual reasoning and thinking. Tiered…

262K

$0.2

$1.6

Apr 2026

mistral-medium-3.5

Official mistral-medium-latest Mistral AI model

262K

$1.5

$7.5

Apr 2026

IBM: Granite 4.1 8B

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from…

131K

$0.05

$0.1

Apr 2026

Poolside: Laguna XS.2

Laguna XS.2 is the second-generation model in the XS size class from Poolside,…

262K

$0.1

$0.2

Apr 2026

Poolside: Laguna M.1

Laguna M.1 is the flagship coding agent model from Poolside, optimized for comp…

262K

$0.2

$0.4

Apr 2026

Owl Alpha ·

Owl Alpha is a high-performance foundation model designed for agentic workloads…

Apr 2026

NVIDIA: Nemotron 3 Nano Omni (free) ·

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to fun…

256K

Apr 2026

Nemotron 3 Nano Omni 30B A3b Reasoning Fp8

Nvidia chat model.

131K

Apr 2026

mistral-medium-latest

Official mistral-medium-latest Mistral AI model

262K

$1.5

$7.5

Apr 2026

Mistral Medium (latest)

Official mistral-medium-latest Mistral AI model

262K

$1.5

$7.5

Apr 2026

Qwen3.6 Max Preview

Alibaba model (not yet curated).

131K

$1.04

$6.24

Apr 2026

Qwen3.6 Flash

HOT

Fast, cost-effective multimodal model with 1M context, near-flagship quality, v…

$0.25

$1.5

Apr 2026

Qwen3.6 35b A3b

Alibaba model (not yet curated).

131K

$0.14

Apr 2026

Qwen3.6 27b

Alibaba model (not yet curated).

131K

$0.45

$2.7

Apr 2026

Qwen3.5 Plus 2026 02 15

Alibaba model (not yet curated).

131K

$0.3

$1.8

Apr 2026

OpenAI GPT Mini Latest

This model always redirects to the latest model in the OpenAI GPT Mini family.

400K

$0.75

$4.5

Apr 2026

OpenAI GPT Latest

This model always redirects to the latest model in the OpenAI GPT family.

1.1M

$30

Apr 2026

MoonshotAI Kimi Latest

This model always redirects to the latest model in the MoonshotAI Kimi family.

$15

Apr 2026

Google Gemini Pro Latest

This model always redirects to the latest model in the Google Gemini Pro family.

$12

Apr 2026

Google Gemini Flash Latest

HOT

This model always redirects to the latest model in the Google Gemini Flash fami…

$1.5

Apr 2026

Anthropic Claude Sonnet Latest

This model always redirects to the latest model in the Anthropic Claude Sonnet…

$10

Apr 2026

Anthropic Claude Haiku Latest

This model always redirects to the latest model in the Anthropic Claude Haiku f…

200K

Apr 2026

DeepSeek V4 Pro

HOT

Premium reasoning model with 1M context. Supports extended thinking modes, JSON…

$0.44

$0.87

Apr 2026

DeepSeek V4 Flash

HOT

Fast general-purpose model with 1M context. Supports extended thinking modes, J…

$0.14

$0.28

Apr 2026

inclusionAI: Ling-2.6-1T

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s t…

262K

$0.08

$0.63

Apr 2026

GPT-5.5 Pro

Most capable model for complex tasks. Uses more compute for smarter, more preci…

1.1M

$30

$180

Apr 2026

GPT-5.5

HOT

New baseline for complex production workflows. Stronger task execution, more pr…

1.1M

$30

Apr 2026

Xiaomi: MiMo-V2.5-Pro

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in gene…

$0.44

$0.87

Apr 2026

Xiaomi: MiMo-V2.5

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic…

$0.14

$0.28

Apr 2026

Tencent: Hy3 preview

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed…

262K

$0.06

$0.21

Apr 2026

Qwen3.6 35B A3b Fp8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3.6-35B-A3B-FP8

262K

Apr 2026

Pareto Code Router

The Pareto Router maintains a tiered shortlist of strong coding models, ranked…

Apr 2026

OpenAI: GPT-5.4 Image 2

GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image gen…

272K

$15

Apr 2026

inclusionAI: Ling-2.6-flash

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total…

262K

$0.01

$0.03

Apr 2026

Gemma 4 E2B-it

Google chat model. https://huggingface.co/api/models/google/gemma-4-E2B-it

131K

Apr 2026

Deep Research Preview (2026-04)

Preview release (April 21th, 2026) of Deep Research

197K

$1.25

$10

Apr 2026

Deep Research Max Preview (2026-04)

HOT

Preview release (April 21st, 2026) of Deep Research Max

197K

$1.25

$10

Apr 2026

Anthropic: Claude Opus Latest

This model always redirects to the latest model in the Claude Opus family.

$25

Apr 2026

Kimi K2.6

Native multimodal flagship (text, image, video inputs) with thinking and non-th…

262K

$0.95

Apr 2026

Grok 4.3

xAI's latest flagship model with reasoning and a 1M token context window. Suppo…

$1.25

$2.5

Apr 2026

Gemma 4 E4B-it

Google chat model. https://huggingface.co/google/gemma-4-E4B-it

131K

Apr 2026

Claude Opus 4.7 · Global

Previous most capable model for complex reasoning and agentic coding (Bedrock I…

$25

Apr 2026

Claude Opus 4.7

Previous most capable model for complex reasoning and agentic coding

$25

Apr 2026

Gemini 3.1 Flash TTS Preview

25K

Apr 2026

Gemini Robotics-ER 1.6 Preview

197K

Apr 2026

Nvidia Nemotron 3 Super 120B A12b Bf16

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-3-S…

262K

Apr 2026

GLM-5.1

Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with…

205K

$1.4

$4.4

Apr 2026

Qwen3.6 Plus

Alibaba model (not yet curated).

131K

$0.33

$1.95

Apr 2026

Gemma 4 31B IT

HOT

Gemma 4 31B IT

295K

Apr 2026

Gemma 4 26B A4B IT

HOT

Gemma 4 26B A4B IT

295K

Apr 2026

GLM-5V Turbo

First multimodal GLM-5 model. Vision-based coding agent with image/video/file i…

205K

$1.2

Apr 2026

Arcee AI: Trinity Large Thinking

Trinity Large Thinking is a powerful open source reasoning model from the team…

262K

$0.25

$0.8

Apr 2026

xAI: Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborativ…

$1.25

$2.5

Mar 2026

xAI: Grok 4.20

HOT

Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic…

$1.25

$2.5

Mar 2026

Holo3 35B A3b

Hcompany chat model. https://huggingface.co/api/models/Hcompany/Holo3-35B-A3B

262K

Mar 2026

Google: Lyria 3 Pro Preview ·

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of m…

Mar 2026

Google: Lyria 3 Clip Preview ·

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's fami…

Mar 2026

Kwaipilot: KAT-Coder-Pro V2

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder se…

256K

$0.3

$1.2

Mar 2026

Qwen3 30B A3B Instruct 2507 Lora

Qwen chat model.

262K

Mar 2026

Deepseek V3.1 NVFP4

DeepSeek chat model. https://huggingface.co/deepseek-ai/DeepSeek-V3.1

131K

$0.6

$1.7

Mar 2026

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that ac…

16K

$0.1

Mar 2026

Qwen3 8B Lora

Qwen chat model.

41K

Mar 2026

MiniMax: MiniMax M2.7

MiniMax-M2.7 is a next-generation large language model designed for autonomous,…

205K

$0.25

Mar 2026

GPT-5.4 Nano

Cheapest GPT-5.4-class model for simple high-volume tasks like classification a…

400K

$0.2

$1.25

Mar 2026

GPT-5.4 Mini

Strongest mini model for coding, computer use, and subagents. GPT-5.4-class int…

400K

$0.75

$4.5

Mar 2026

Qwen3.5 122B A10b Fp8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3.5-122B-A10B-FP8

262K

Mar 2026

mistral-small-latest

Mistral Small 4.

262K

$0.15

$0.6

Mar 2026

Mistral Small (2603)

Mistral Small 4.

262K

$0.15

$0.6

Mar 2026

Leanstral (2603)

A mid & post-trained version of mistral small 4 for Lean

197K

Mar 2026

GLM-5 Turbo

Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and…

205K

$1.2

Mar 2026

Deepseek OCR 2

Deepseek chat model. https://huggingface.co/api/models/deepseek-ai/DeepSeek-OCR…

Mar 2026

NVIDIA: Nemotron 3 Super

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating j…

$0.08

$0.45

Mar 2026

Nvidia Nemotron 3 Super 120B A12b Fp8

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-3-S…

262K

Mar 2026

Qwen: Qwen3.5-9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed t…

262K

$0.1

$0.15

Mar 2026

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers…

262K

$0.25

Mar 2026

Grok 4.20 Reasoning

xAI flagship reasoning model with a 1M token context window. Deep reasoning and…

$1.25

$2.5

Mar 2026

Grok 4.20 Multi-Agent

Multi-agent model that runs specialized agents in parallel for collaborative ve…

$1.25

$2.5

Mar 2026

Grok 4.20

HOT

xAI flagship model with a 1M token context window. Non-reasoning variant for fa…

$1.25

$2.5

Mar 2026

Qwen3.5 9B Fp8

Qwen chat model. https://huggingface.co/api/models/togethercomputer/Qwen3.5-9B-…

262K

Mar 2026

GPT-5.4 Pro

Most capable model for complex tasks. Uses more compute for smarter, more preci…

1.1M

$30

$180

Mar 2026

GPT-5.4

Most capable and efficient frontier model for professional work. Native compute…

1.1M

$2.5

$15

Mar 2026

Inception: Mercury 2

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion…

128K

$0.25

$0.75

Mar 2026

OpenAI: GPT-5.3 Chat

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conv…

128K

$1.75

$14

Mar 2026

GPT-5.3 Instant

deprecated

GPT-5.3 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.

128K

$1.75

$14

Mar 2026

Glm 4.7 Fp8

Zai Org chat model. https://huggingface.co/api/models/zai-org/GLM-4.7-FP8

203K

Mar 2026

Gemini 3.1 Flash-Lite Preview

Gemini 3.1 Flash Lite Preview

1.1M

$0.25

$1.5

Mar 2026

Nano Banana 2 Preview

Gemini 3.1 Flash Image Preview.

131K

$0.5

Feb 2026

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive s…

262K

$0.1

$0.4

Feb 2026

Qwen3.5 35b A3b

Alibaba model (not yet curated).

131K

$0.14

Feb 2026

Qwen3.5 27b

Alibaba model (not yet curated).

131K

$0.26

$2.6

Feb 2026

Qwen3.5 122b A10b

Alibaba model (not yet curated).

131K

$0.26

$2.08

Feb 2026

Qwen: Qwen3.5-Flash

The Qwen3.5 native vision-language Flash models are built on a hybrid architect…

$0.07

$0.26

Feb 2026

LiquidAI: LFM2-24B-A2B

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures de…

128K

$0.03

$0.12

Feb 2026

GPT Audio 1.5

Best voice model for audio in, audio out with Chat Completions. Accepts audio i…

128K

$2.5

$10

Feb 2026

AionLabs: Aion-2.0

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and…

131K

$0.8

$1.6

Feb 2026

Gemini 3.1 Pro Preview (Custom Tools)

Gemini 3.1 Pro Preview optimized for custom tool usage

1.1M

$12

Feb 2026

Gemini 3.1 Pro Preview

HOT

Gemini 3.1 Pro Preview

1.1M

$12

Feb 2026

Claude Sonnet 4.6 · US

Best combination of speed and intelligence for everyday tasks (Bedrock Inferenc…

$15

Feb 2026

Claude Sonnet 4.6

Best combination of speed and intelligence for everyday tasks

$15

Feb 2026

Qwen3.5 397b A17b

Alibaba model (not yet curated).

131K

$0.39

$2.34

Feb 2026

Qwen: Qwen3.5 Plus 2026-02-15

The Qwen3.5 native vision-language series Plus models are built on a hybrid arc…

$0.26

$1.56

Feb 2026

MiniMax M2.5 FP4

MiniMaxAI chat model.

Feb 2026

MiniMax: MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivit…

205K

$0.15

$0.9

Feb 2026

GLM 5 Fp4

Zai Org chat model. https://huggingface.co/api/models/togethercomputer/GLM-5-FP4

203K

Feb 2026

GLM-5

Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic…

205K

$3.2

Feb 2026

Qwen: Qwen3 Max Thinking

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designe…

262K

$0.78

$3.9

Feb 2026

GPT-5.3 Codex

Most capable agentic coding model. Combines frontier coding performance of GPT-…

400K

$1.75

$14

Feb 2026

Claude Opus 4.6 · Global

Previous most intelligent model for complex agents and coding, with adaptive th…

$25

Feb 2026

Claude Opus 4.6

Previous most intelligent model for complex agents and coding, with adaptive th…

$25

Feb 2026

Qwen3 Coder Next

Alibaba model (not yet curated).

131K

$0.11

$0.8

Feb 2026

GLM-OCR (Vision, OCR)

Specialized OCR model for text extraction from images and documents.

131K

$0.03

Feb 2026

Free Models Router ·

The simplest way to get free inference. openrouter/free is a router that select…

200K

Feb 2026

StepFun: Step 3.5 Flash

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on…

262K

$0.1

$0.3

Jan 2026

Upstage: Solar Pro 3

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With…

128K

$0.15

$0.6

Jan 2026

Kimi K2.5

Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.

262K

$0.6

Jan 2026

MiniMax: MiniMax M2-her

MiniMax M2-her is a dialogue-first large language model built for immersive rol…

66K

$0.3

$1.2

Jan 2026

Writer: Palmyra X5

Palmyra X5 is Writer's most advanced model, purpose-built for building and scal…

$0.6

Jan 2026

LiquidAI: LFM2.5-1.2B-Thinking (free) ·

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for age…

33K

Jan 2026

LiquidAI: LFM2.5-1.2B-Instruct (free) ·

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model bui…

33K

Jan 2026

GLM-4.7 FlashX

Fast GLM-4.7 variant with priority routing and higher concurrency. Same model a…

131K

$0.07

$0.4

Jan 2026

GLM-4.7 Flash (Free)

Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 conc…

131K

Jan 2026

Z.AI GLM 4.7

Z.AI model via OpenAI-Compatible API (Bedrock Foundation Model)

203K

$2.25

$2.75

Jan 2026

MiniMax: MiniMax M2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized…

205K

$0.3

$1.2

Dec 2025

ByteDance Seed: Seed 1.6 Flash

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance See…

262K

$0.08

$0.3

Dec 2025

ByteDance Seed: Seed 1.6

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It inc…

262K

$0.25

Dec 2025

GLM-4.7

Latest-gen GLM model with 128K context. Thinking mode activated by default.

131K

$0.6

$2.2

Dec 2025

Gemini 3 Flash Preview

1.1M

$0.5

Dec 2025

Nvidia Nemotron 3 Nano 30B A3b Bf16

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-3-N…

262K

Dec 2025

NVIDIA: Nemotron 3 Nano 30B A3B

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compu…

262K

$0.05

$0.2

Dec 2025

OpenAI: GPT-5.2 Chat

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, o…

128K

$1.75

$14

Dec 2025

GPT-5.2 Pro

Smartest and most trustworthy option for difficult questions. Uses more compute…

400K

$21

$168

Dec 2025

GPT-5.2 Instant

deprecated

GPT-5.2 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.

128K

$1.75

$14

Dec 2025

GPT-5.2 Codex

deprecated

GPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar en…

400K

$1.75

$14

Dec 2025

GPT-5.2

Most capable model for professional work and long-running agents. Improvements…

400K

$1.75

$14

Dec 2025

Deep Research Pro Preview

Preview release (December 12th, 2025) of Deep Research Pro

197K

$1.25

$10

Dec 2025

AutoGLM Phone

Mobile phone automation agent. Understands phone screens via multimodal percept…

131K

Dec 2025

Mistral: Devstral 2 2512

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing i…

262K

$0.4

Dec 2025

Devstral 2 (latest)

Official mistral-medium-latest Mistral AI model

262K

$0.4

Dec 2025

Devstral 2 (latest)

Official devstral-2512 Mistral AI model

262K

$0.4

Dec 2025

Devstral 2 (latest)

Official devstral-2512 Mistral AI model

262K

$0.4

Dec 2025

Relace: Relace Search

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to e…

256K

Dec 2025

GLM-4.6 V FlashX

Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/f…

131K

$0.04

$0.4

Dec 2025

GLM-4.6 V Flash (Free)

Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concu…

131K

Dec 2025

GLM-4.6 V

Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hyb…

131K

$0.3

$0.9

Dec 2025

EssentialAI Rnj-1 Instruct

Essential AI chat model. https://huggingface.co/api/models/togethercomputer/Ess…

33K

Dec 2025

Body Builder (beta)

Transform your natural language requests into structured OpenRouter API request…

128K

Dec 2025

mistral-large-latest

Official mistral-large-2512 Mistral AI model

262K

$0.5

$1.5

Dec 2025

ministral-8b-latest

Ministral 3 (a.k.a. Tinystral) 8B Instruct.

262K

$0.15

Dec 2025

ministral-3b-latest

Ministral 3 (a.k.a. Tinystral) 3B Instruct.

131K

$0.1

Dec 2025

ministral-14b-latest

Ministral 3 (a.k.a. Tinystral) 14B Instruct.

262K

$0.2

Dec 2025

Ministral 8b (2512)

Ministral 3 (a.k.a. Tinystral) 8B Instruct.

262K

$0.15

Dec 2025

Ministral 3b (2512)

Ministral 3 (a.k.a. Tinystral) 3B Instruct.

131K

$0.1

Dec 2025

Ministral 3 14B Instruct 2512

Mistralai chat model. https://huggingface.co/api/models/mistralai/Ministral-3-1…

262K

$0.2

Dec 2025

Ministral 14b (2512)

Ministral 3 (a.k.a. Tinystral) 14B Instruct.

262K

$0.2

Dec 2025

Amazon: Nova 2 Lite

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads th…

$0.3

$2.5

Dec 2025

Mistral Large (2512)

Official mistral-large-2512 Mistral AI model

262K

$0.5

$1.5

Dec 2025

DeepSeek: DeepSeek V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computationa…

164K

$0.27

$0.4

Dec 2025

Arcee AI: Trinity Mini

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language…

131K

$0.05

$0.15

Dec 2025

Claude Opus 4.5 · US

Previous most intelligent model with advanced reasoning for complex agentic wor…

200K

$25

Nov 2025

Claude Opus 4.5

Previous most intelligent model with advanced reasoning for complex agentic wor…

200K

$25

Nov 2025

AllenAI: Olmo 3 32B Think

Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for…

66K

$0.15

$0.5

Nov 2025

Nano Banana Pro Preview

Gemini 3 Pro Image Preview

164K

$12

Nov 2025

Nano Banana Pro

Gemini 3 Pro Image Preview

164K

$12

Nov 2025

GPT-5.1 Codex Max

deprecated

Our most intelligent coding model optimized for long-horizon, agentic coding ta…

400K

$1.25

$10

Nov 2025

GPT-5.1 Codex Mini

deprecated

Smaller, faster version of GPT-5.1 Codex for efficient coding tasks.

400K

$0.25

Nov 2025

GPT-5.1 Codex

deprecated

A version of GPT-5.1 optimized for agentic coding tasks in Codex or similar env…

400K

$1.25

$10

Nov 2025

GPT-5.1

HOT

The best model for coding and agentic tasks with configurable reasoning effort.

400K

$1.25

$10

Nov 2025

Deep Cogito: Cogito v2.1 671B

Cogito v2.1 671B MoE represents one of the strongest open models globally, matc…

128K

$1.25

Nov 2025

OpenAI: GPT-5.1 Chat

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, op…

128K

$1.25

$10

Nov 2025

GPT-5.1 Instant

deprecated

GPT-5.1 Instant with adaptive reasoning. More conversational with improved inst…

128K

$1.25

$10

Nov 2025

Qwen3-VL-235B-A22B-Instruct-FP8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-VL-235B-A22B-Inst…

262K

Nov 2025

MoonshotAI: Kimi K2 Thinking

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, e…

262K

$0.6

$2.5

Nov 2025

Amazon: Nova Premier 1.0

Amazon Nova Premier is the most capable of Amazon’s multimodal models for compl…

$2.5

$12.5

Oct 2025

Perplexity: Sonar Pro Search

Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is…

200K

$15

Oct 2025

Mistral: Voxtral Small 24B 2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-…

32K

$0.1

$0.3

Oct 2025

OpenAI: gpt-oss-safeguard-20b

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-os…

131K

$0.08

$0.3

Oct 2025

NVIDIA: Nemotron Nano 12B 2 VL (free) ·

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning m…

128K

Oct 2025

Medgemma 27B Text It

Google chat model. https://huggingface.co/api/models/google/medgemma-27b-text-it

131K

Oct 2025

Qwen: Qwen3 VL 32B Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designe…

262K

$0.1

$0.42

Oct 2025

MiniMax: MiniMax M2

MiniMax-M2 is a compact, high-efficiency large language model optimized for end…

205K

$0.3

$1.2

Oct 2025

IBM: Granite 4.0 Micro

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. Thes…

131K

$0.02

$0.11

Oct 2025

Microsoft: Phi 4 Mini Instruct

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and f…

131K

$0.08

$0.35

Oct 2025

Claude Haiku 4.5 · Global

Fastest model with exceptional speed and performance (Bedrock Inference Profile)

200K

Oct 2025

Claude Haiku 4.5

Fastest model with exceptional speed and performance

200K

Oct 2025

Qwen: Qwen3 VL 8B Thinking

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B mult…

256K

$0.12

$1.37

Oct 2025

Qwen: Qwen3 VL 8B Instruct

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL se…

256K

$0.12

$0.46

Oct 2025

GPT-5 Search API

Updated web search model in Chat Completions API. 60% cheaper with domain filte…

400K

$1.25

$10

Oct 2025

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning…

131K

$0.4

Oct 2025

Gemini 2.5 Computer Use Preview 10-2025

197K

$1.25

$10

Oct 2025

Qwen: Qwen3 VL 30B A3B Thinking

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text genera…

131K

$0.13

$1.56

Oct 2025

Qwen: Qwen3 VL 30B A3B Instruct

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text genera…

262K

$0.13

$0.52

Oct 2025

GPT-5 Pro

Version of GPT-5 that uses more compute to produce smarter and more precise res…

400K

$15

$120

Oct 2025

GPT Audio Mini

Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completio…

128K

$0.6

$2.4

Oct 2025

Nano Banana

Gemini 2.5 Flash Preview Image

66K

$0.3

$2.5

Oct 2025

GLM-4.6

GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whethe…

131K

$0.6

$2.2

Sep 2025

Gemma 3 270M It

Google chat model. https://huggingface.co/api/models/google/gemma-3-270m-it

33K

Sep 2025

DeepSeek: DeepSeek V3.2 Exp

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek…

164K

$0.27

$0.41

Sep 2025

Claude Sonnet 4.5 · US

Previous best combination of speed and intelligence for complex agents and codi…

200K

$15

Sep 2025

Claude Sonnet 4.5

HOT

Previous best combination of speed and intelligence for complex agents and codi…

200K

$15

Sep 2025

TheDrummer: Cydonia 24B V4.1

Uncensored and creative writing model based on Mistral Small 3.2 24B with good…

131K

$0.3

$0.5

Sep 2025

Relace: Relace Apply 3

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edit…

256K

$0.85

$1.25

Sep 2025

Google: Gemini 2.5 Flash Lite Preview 09-2025

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family…

$0.1

$0.4

Sep 2025

Qwen3 Next 80B A3b Instruct Fp8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-Next-80B-A3B-Inst…

Sep 2025

Qwen3 Vl 235b A22b Thinking

Alibaba model (not yet curated).

131K

$0.26

$2.6

Sep 2025

Qwen3 Vl 235b A22b Instruct

Alibaba model (not yet curated).

131K

$0.21

$1.9

Sep 2025

Qwen3 Max

Alibaba model (not yet curated).

131K

$0.78

$3.9

Sep 2025

Qwen3 Coder Plus

Agentic coding model with very long context. Tiered pricing by input length (up…

Sep 2025

DeepSeek: DeepSeek V3.1 Terminus

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's…

131K

$0.27

Sep 2025

Qwen3 Coder Flash

Alibaba model (not yet curated).

131K

$0.2

$0.98

Sep 2025

Nvidia Nemotron Nano 9B V2

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-Nan…

131K

$0.06

$0.25

Sep 2025

magistral-small-latest

Mistral Small 4.

262K

$0.5

$1.5

Sep 2025

magistral-medium-latest

Our frontier-class reasoning model release candidate September 2025.

131K

Sep 2025

Magistral Small (2509)

Our efficient reasoning model released September 2025.

131K

$0.5

$1.5

Sep 2025

Magistral Medium (2509)

Our frontier-class reasoning model release candidate September 2025.

131K

Sep 2025

GPT-5 Codex

deprecated

A version of GPT-5 optimized for agentic coding in Codex.

400K

$1.25

$10

Sep 2025

Qwen3 Next 80b A3b Thinking

Alibaba model (not yet curated).

131K

$0.1

$0.78

Sep 2025

Qwen3 Next 80b A3b Instruct

Alibaba model (not yet curated).

131K

$0.1

$0.78

Sep 2025

NVIDIA: Nemotron Nano 9B V2 (free) ·

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch…

128K

Sep 2025

MoonshotAI: Kimi K2 0905

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixtu…

262K

$0.6

$2.5

Sep 2025

[Groq] Compound Mini (Agentic System)

Lighter Groq agentic AI with web search, code execution. Pricing based on under…

131K

Sep 2025

[Groq] Compound (Agentic System)

Groq agentic AI with web search, code execution, browser automation. Uses GPT-O…

131K

Sep 2025

Qwen3 30b A3b Thinking 2507

Alibaba model (not yet curated).

131K

$0.13

$1.56

Aug 2025

GPT Audio

First generally available audio model. Accepts audio inputs and outputs, and ca…

128K

$2.5

$10

Aug 2025

Nous: Hermes 4 70B

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llam…

131K

$0.13

$0.4

Aug 2025

Nous: Hermes 4 405B

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and rele…

131K

Aug 2025

DeepSeek: DeepSeek V3.1

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) t…

164K

$0.25

$0.95

Aug 2025

Mistral: Mistral Medium 3.1

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-p…

131K

$0.4

Aug 2025

Mistral Medium (2508)

Update on Mistral Medium 3 with improved capabilities.

131K

$0.4

Aug 2025

GLM-4.5 V

Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.

98K

$0.6

$1.8

Aug 2025

Qwen3 4B Instruct 2507

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-4B-Instruct-2507

262K

Aug 2025

AI21: Jamba Large 1.7

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvem…

256K

Aug 2025

OpenAI: GPT-5 Chat

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware con…

128K

$1.25

$10

Aug 2025

GPT-5 Nano

Fastest, most cost-efficient version of GPT-5 for summarization and classificat…

400K

$0.05

$0.4

Aug 2025

GPT-5 Mini

A faster, more cost-efficient version of GPT-5 for well-defined tasks.

400K

$0.25

Aug 2025

GPT-5 ChatGPT

deprecated

GPT-5 model used in ChatGPT.

128K

$1.25

$10

Aug 2025

GPT-5

The best model for coding and agentic tasks across domains.

400K

$1.25

$10

Aug 2025

OpenAI: gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the…

131K

$0.03

$0.13

Aug 2025

OpenAI: gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) languag…

131K

$0.04

$0.17

Aug 2025

Claude Opus 4.1· US

deprecated

Previous Opus model. Deprecated June 5, 2026, retiring August 5, 2026. (Bedrock…

200K

$15

$75

Aug 2025

Claude Opus 4.1

deprecated

Previous Opus model. Deprecated June 5, 2026, retiring August 5, 2026.

200K

$15

$75

Aug 2025

Command A Translate

Specialized machine translation across 23 languages, with tool use and JSON out…

Aug 2025

Command A Reasoning

Reasoning-tuned Command A for multi-step agents and hard problem solving across…

289K

$2.5

$10

Aug 2025

Qwen: Qwen3 Coder 30B A3B Instruct

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) mode…

160K

$0.07

$0.27

Jul 2025

Glm 4.5 Air Fp8

Zai Org chat model. https://huggingface.co/api/models/zai-org/GLM-4.5-Air-FP8

131K

$0.2

$1.1

Jul 2025

codestral-latest

Our cutting-edge language model for coding released August 2025.

256K

$0.3

$0.9

Jul 2025

Codestral (2508)

Our cutting-edge language model for coding released August 2025.

256K

$0.3

$0.9

Jul 2025

Qwen3 30b A3b Instruct 2507

Alibaba model (not yet curated).

131K

$0.1

$0.3

Jul 2025

Qwen3 235B A22b Instruct 2507 Fp8

Together AI chat model. https://huggingface.co/api/models/Qwen/Qwen3-235B-A22B-…

262K

Jul 2025

GLM-4.5 X

Extended GLM-4.5 model. Interleaved thinking.

98K

$2.2

$8.9

Jul 2025

GLM-4.5 Flash (Free)

Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7…

98K

Jul 2025

GLM-4.5 AirX

Extended lightweight GLM-4.5 variant. Interleaved thinking.

98K

$1.1

$4.5

Jul 2025

Qwen3 235b A22b Thinking 2507

Alibaba model (not yet curated).

131K

$0.3

Jul 2025

GLM-4.5 Air

Lightweight GLM-4.5 variant. Interleaved thinking.

98K

$0.2

$1.1

Jul 2025

GLM-4.5

Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.

98K

$0.6

$2.2

Jul 2025

Qwen3 Coder 480B A35B Instruct Fp8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-Coder-480B-A35B-I…

262K

Jul 2025

Qwen: Qwen3 Coder 480B A35B (free) ·

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation mo…

Jul 2025

Qwen3 235B A22B Instruct 2507 FP8 Throughput

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-235B-A22B-Instruc…

262K

$0.2

$0.6

Jul 2025

Gemini 2.5 Flash-Lite

Stable version of Gemini 2.5 Flash-Lite, released in July of 2025

1.1M

$0.1

$0.4

Jul 2025

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based envir…

128K

$0.1

$0.2

Jul 2025

Qwen: Qwen3 235B A22B Instruct 2507

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-e…

262K

$0.09

$0.55

Jul 2025

voxtral-small-latest

A small audio understanding model released in July 2025

33K

$0.1

$0.3

Jul 2025

voxtral-mini-latest

A mini audio understanding model released in July 2025

33K

$0.04

Jul 2025

Voxtral Small (2507)

A small audio understanding model released in July 2025

33K

$0.1

$0.3

Jul 2025

Voxtral Mini (2507)

A mini audio understanding model released in July 2025

33K

$0.04

Jul 2025

Switchpoint Router

Switchpoint AI's router instantly analyzes your request and directs it to the o…

131K

$0.85

$3.4

Jul 2025

MoonshotAI: Kimi K2 0711

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model devel…

131K

$0.57

$2.3

Jul 2025

Sarvam M

Sarvamai chat model. https://huggingface.co/api/models/sarvamai/sarvam-m

33K

Jul 2025

Meta Llama 3.1 8B Instruct Awq Int4

Meta chat model. https://huggingface.co/api/models/togethercomputer/meta-llama-…

131K

Jul 2025

Venice: Uncensored (free) ·

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of…

33K

Jul 2025

Tencent: Hunyuan A13B Instruct

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model…

131K

$0.14

$0.57

Jul 2025

Morph: Morph V3 Large

Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec wit…

262K

$0.9

$1.9

Jul 2025

Morph: Morph V3 Fast

Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accurac…

82K

$0.8

$1.2

Jul 2025

Command A Vision

Multimodal Command A for charts, graphs, diagrams, OCR, and document understand…

128K

Jul 2025

Baidu: ERNIE 4.5 VL 424B A47B

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baid…

131K

$0.42

$1.25

Jun 2025

Minimax M1 80K

MiniMaxAI chat model. https://huggingface.co/api/models/togethercomputer/MiniMa…

Jun 2025

o4 Mini Deep Research

deprecated

Faster, more affordable deep research model for complex, multi-step research ta…

200K

Jun 2025

o3 Deep Research

deprecated

Our most powerful deep research model for complex, multi-step research tasks.

200K

$10

$40

Jun 2025

Minimax M1 40K

MiniMaxAI chat model. https://huggingface.co/api/models/togethercomputer/MiniMa…

Jun 2025

Mistral: Mistral Small 3.2 24B

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mist…

131K

$0.1

$0.3

Jun 2025

Mistral Small (2506)

Our latest enterprise-grade small model with the latest version released June 2…

131K

$0.1

$0.3

Jun 2025

MiniMax: MiniMax M1

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended…

$0.55

$2.2

Jun 2025

Gemini 2.5 Pro

Stable release (June 17th, 2025) of Gemini 2.5 Pro

1.1M

$1.25

$10

Jun 2025

Gemini 2.5 Flash

Stable version of Gemini 2.5 Flash, our mid-size multimodal model that supports…

1.1M

$0.3

$2.5

Jun 2025

Magistral Small 2506

Mistralai chat model. https://huggingface.co/api/models/mistralai/Magistral-Sma…

41K

Jun 2025

Llama 4 Scout (17Bx16E)

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-4-Scout-17B…

262K

Jun 2025

o3 Pro

Version of o3 with more compute for better responses. Provides consistently bet…

200K

$20

$80

Jun 2025

Gemma 2B It

Google chat model. https://huggingface.co/api/models/google/gemma-2b-it

Jun 2025

Gemma 2 9B It

Google chat model. https://huggingface.co/api/models/google/gemma-2-9b-it

Jun 2025

Qwen3 1.7B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-1.7B

41K

Jun 2025

Qwen3 0.6B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-0.6B

41K

Jun 2025

Molmo 7B D 0924

Allenai chat model. https://huggingface.co/api/models/allenai/Molmo-7B-D-0924

May 2025

DeepSeek: R1 0528

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1,…

164K

$0.5

$2.15

May 2025

Mixtral 8X22b Instruct V0.1

Mistralai chat model. https://huggingface.co/api/models/mistralai/Mixtral-8x22B…

66K

May 2025

Claude Sonnet 4 [Retired] · US

High-performance model. Retired June 15, 2026 (except on Bedrock and Vertex AI)…

200K

$15

May 2025

Anthropic: Claude Sonnet 4

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Son…

$15

May 2025

Anthropic: Claude Opus 4

Claude Opus 4 is benchmarked as the world’s best coding model, at time of relea…

200K

$15

$75

May 2025

Devstral Small 2505

Mistralai chat model. https://huggingface.co/api/models/togethercomputer/Devstr…

131K

May 2025

Mistral 7B v0.1

Mistralai chat model. https://huggingface.co/api/models/mistralai/Mistral-7B-v0…

33K

May 2025

Google: Gemma 3n 4B

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource…

33K

$0.06

$0.12

May 2025

Google: Gemini 2.5 Pro Preview 06-05

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reas…

$1.25

$10

May 2025

Gemini 2.5 Pro Preview TTS

25K

May 2025

Gemini 2.5 Flash Preview TTS

25K

$0.5

May 2025

Deepcoder 14B Preview

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

131K

May 2025

mistral-medium-3

Official mistral-medium-latest Mistral AI model

262K

$0.4

May 2025

Mistral Medium (2505)

Our frontier-class multimodal model released May 2025.

131K

$0.4

May 2025

Google: Gemini 2.5 Pro Preview 05-06

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reas…

$1.25

$10

May 2025

Arcee AI: Virtuoso Large

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tune…

131K

$0.75

$1.2

May 2025

Arcee AI: Coder Large

Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been fu…

33K

$0.5

$0.8

May 2025

Meta: Llama Guard 4 12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tune…

164K

$0.18

Apr 2025

Qwen3 8b

Alibaba model (not yet curated).

131K

$0.12

$0.46

Apr 2025

Qwen3 32b

Alibaba model (not yet curated).

131K

$0.08

$0.28

Apr 2025

Qwen3 30b A3b

Alibaba model (not yet curated).

131K

$0.13

$0.52

Apr 2025

Qwen3 235b A22b

Alibaba model (not yet curated).

131K

$0.46

$1.82

Apr 2025

Qwen3 14b

Alibaba model (not yet curated).

131K

$0.12

$0.24

Apr 2025

Arize AI Qwen 2 1.5B Instruct

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

33K

$0.1

Apr 2025

o4 Mini

deprecated

Latest o4-mini model. Optimized for fast, effective reasoning with exceptionall…

200K

$1.1

$4.4

Apr 2025

A well-rounded and powerful model across domains. Sets a new standard for math,…

200K

Apr 2025

Llama 3.1 405B

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-3.1-405B

131K

Apr 2025

Qwen2.5 7B Instruct

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct

33K

Apr 2025

Qwen2.5 7B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-7B

131K

Apr 2025

Qwen2.5 72B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-72B

131K

Apr 2025

Qwen2.5 3B Instruct

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-3B-Instruct

33K

Apr 2025

Qwen2.5 32B Instruct

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-32B-Instruct

33K

Apr 2025

Qwen2.5 32B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-32B

131K

Apr 2025

Qwen2.5 14B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-14B

131K

Apr 2025

Qwen2.5 1.5B Instruct

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-1.5B-Instruct

33K

Apr 2025

Qwen2.5 1.5B

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-1.5B

131K

Apr 2025

Llama 3.2 1B

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-3.2-1B

131K

Apr 2025

Llama 3.1 70B

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-3.1-70B

131K

Apr 2025

GPT-4.1 Nano

Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance wi…

$0.1

$0.4

Apr 2025

GPT-4.1 Mini

Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intell…

$0.4

$1.6

Apr 2025

GPT-4.1

Flagship GPT model for complex tasks. Major improvements on coding, instruction…

Apr 2025

GLM-4 32B (0414) 128K

GLM-4 32B model with 128K context, 16K output.

131K

$0.1

Apr 2025

Qwen2 72B Instruct

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

33K

$0.9

Apr 2025

Cogito V1 Preview Qwen 32B

deepcogito chat model.

131K

Apr 2025

Cogito V1 Preview Qwen 14B

deepcogito chat model.

131K

Apr 2025

Cogito V1 Preview Llama 8B

deepcogito chat model.

131K

Apr 2025

Cogito V1 Preview Llama 70B Turbo

deepcogito chat model.

131K

Apr 2025

Cogito V1 Preview Llama 70B

deepcogito chat model.

131K

Apr 2025

Meta: Llama 4 Scout

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model d…

10M

$0.1

$0.3

Apr 2025

Meta: Llama 4 Maverick

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language mod…

$0.2

$0.8

Apr 2025

[Meta] Llama 4 Scout · 17B × 16E (Preview)

Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal wi…

131K

$0.11

$0.34

Apr 2025

Gemma 3 1b it

Google chat model.

33K

Apr 2025

DeepSeek R1 Distill Qwen 7B

Deepseek chat model. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwe…

131K

Apr 2025

meta-llama/Llama-2-7b-chat-hf

Meta chat model. https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

Apr 2025

DeepSeek: DeepSeek V3 0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteratio…

164K

$0.27

$1.12

Mar 2025

o1 Pro

A version of o1 with more compute for better responses. Provides consistently b…

200K

$150

$600

Mar 2025

nim/nvidia/llama-3.3-nemotron-super-49b-v1

Nvidia chat model.

16K

Mar 2025

Mistral: Mistral Small 3.1 24B

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501)…

128K

$0.35

$0.56

Mar 2025

nim/nv-mistralai/mistral-nemo-12b-instruct

NVIDIA chat model.

16K

Mar 2025

nim/mistralai/mixtral-8x7b-instruct-v01

mistralai chat model.

16K

Mar 2025

nim/meta/llama-3.1-70b-instruct

Llama chat model.

16K

Mar 2025

nim/meta/llama-3.1-8b-instruct

Meta chat model.

16K

Mar 2025

Google: Gemma 3 4B

Gemma 3 introduces multimodality, supporting vision-language input and text out…

131K

$0.05

$0.1

Mar 2025

Google: Gemma 3 12B

Gemma 3 introduces multimodality, supporting vision-language input and text out…

131K

$0.05

$0.15

Mar 2025

Command A

Cohere's efficient 111B enterprise model for agents, tool use, and multilingual…

288K

$2.5

$10

Mar 2025

Cohere: Command A

Command A is an open-weights 111B parameter model with a 256k context window fo…

256K

$2.5

$10

Mar 2025

Reka Flash 3

Reka Flash 3 is a general-purpose, instruction-tuned large language model with…

66K

$0.1

$0.2

Mar 2025

nim/nvidia/llama-3.1-nemotron-70b-instruct

NVIDIA chat model.

16K

Mar 2025

nim/meta/llama-3.3-70b-instruct

Meta chat model.

16K

Mar 2025

Google: Gemma 3 27B

Gemma 3 introduces multimodality, supporting vision-language input and text out…

131K

$0.1

$0.3

Mar 2025

GPT-4o Search Preview

deprecated

Latest snapshot of the GPT-4o model optimized for web search capabilities.

128K

$2.5

$10

Mar 2025

GPT-4o Mini Search Preview

deprecated

Latest snapshot of the GPT-4o Mini model optimized for web search capabilities.

128K

$0.15

$0.6

Mar 2025

TheDrummer: Skyfall 36B V2

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fin…

33K

$0.55

$0.8

Mar 2025

nim/mistralai/mixtral-8x22b-instruct-v01

Mistral chat model.

16K

Mar 2025

nim/meta/llama-3.2-90b-vision-instruct

Meta chat model.

16K

Mar 2025

nim/meta/llama-3.2-11b-vision-instruct

Nvidia chat model.

16K

Mar 2025

Meta Llama 3.1 8B Instruct Turbo

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

131K

$0.18

Mar 2025

Qwen QwQ-32B

Qwen chat model. https://huggingface.co/Qwen/QwQ-32B

131K

$1.2

Mar 2025

Sonar Reasoning Pro

Premier reasoning model (DeepSeek R1) with Chain of Thought. 128k context.

128K

Feb 2025

Mistral: Saba

Mistral Saba is a 24B-parameter language model specifically designed for the Mi…

33K

$0.2

$0.6

Feb 2025

Sonar Deep Research

Expert-level research model for exhaustive searches and comprehensive reports.…

128K

Feb 2025

Gemini 2.0 Flash 001

Stable version of Gemini 2.0 Flash, our fast and versatile multimodal model for…

1.1M

$0.1

$0.4

Feb 2025

AionLabs: Aion-RP 1.0 (8B)

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of t…

33K

$0.8

$1.6

Feb 2025

AionLabs: Aion-1.0-Mini

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 mod…

131K

$0.7

$1.4

Feb 2025

AionLabs: Aion-1.0

Aion-1.0 is a multi-model system designed for high performance across various t…

131K

Feb 2025

Qwen: Qwen2.5 VL 72B Instruct

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds,…

131K

$0.8

Feb 2025

Qwen Plus

Balanced quality, speed, and cost with hybrid thinking. 1M context.

$0.4

$1.2

Feb 2025

Command R7B Arabic

Command R7B tuned for Modern Standard Arabic and English enterprise use cases.…

128K

$0.04

$0.15

Feb 2025

o3 Mini

Latest o3-mini model snapshot. High intelligence at the same cost and latency t…

200K

$1.1

$4.4

Jan 2025

Mistral: Mistral Small 3

Mistral Small 3 is a 24B-parameter language model optimized for low-latency per…

33K

$0.05

$0.08

Jan 2025

DeepSeek R1 Distill Qwen 14B

DeepSeek chat model. https://huggingface.co/api/models/deepseek-ai/DeepSeek-R1-…

131K

$1.6

Jan 2025

DeepSeek R1 Distill Qwen 1.5B

DeepSeek chat model. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwe…

131K

$0.18

Jan 2025

DeepSeek: R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llam…

128K

$0.8

Jan 2025

Sonar Pro

Advanced search model for complex queries and deep content understanding. 200k…

200K

$15

Jan 2025

Sonar

Lightweight, cost-effective search model for quick, grounded answers. 128k cont…

128K

Jan 2025

DeepSeek: R1

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and wi…

164K

$0.7

$2.5

Jan 2025

V1 8K Vision (Preview)

Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision f…

$0.2

Jan 2025

V1 32K Vision (Preview)

Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision…

33K

Jan 2025

V1 128K Vision (Preview)

Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision…

131K

Jan 2025

MiniMax: MiniMax-01

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01…

$0.2

$1.1

Jan 2025

Microsoft: Phi 4

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks…

16K

$0.07

$0.14

Jan 2025

Qwen2-VL (72B) Instruct

Qwen chat model. https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct

33K

$1.2

Jan 2025

Sao10K: Llama 3.1 70B Hanami x1

This is Sao10K's experiment over Euryale v2.2.

16K

Jan 2025

DeepSeek: DeepSeek V3

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instr…

131K

$0.2

$0.8

Dec 2024

Sao10K: Llama 3.3 Euryale 70B

Euryale L3.3 70B is a model focused on creative roleplay from Sao10k. It is the…

131K

$0.65

$0.75

Dec 2024

deprecated

Previous full o-series reasoning model.

200K

$15

$60

Dec 2024

Cohere: Command R7B (12-2024)

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivere…

128K

$0.04

$0.15

Dec 2024

Qwen 2.5 14B Instruct

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-14B-Instruct

33K

$0.8

Dec 2024

Qwen2.5 72B Instruct

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

33K

$1.2

Dec 2024

Meta: Llama 3.3 70B Instruct (free) ·

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and…

131K

Dec 2024

Meta Llama 3.3 70B Instruct Turbo

Meta chat model. https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

131K

$1.04

Dec 2024

Meta Llama 3.1 405B Instruct

Meta chat model. https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct

$3.5

Dec 2024

[Meta] Llama 3.3 · 70B Versatile

Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 1…

131K

$0.59

$0.79

Dec 2024

Amazon: Nova Pro 1.0

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on provid…

300K

$0.8

$3.2

Dec 2024

Amazon: Nova Micro 1.0

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency res…

128K

$0.04

$0.14

Dec 2024

Amazon: Nova Lite 1.0

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focus…

300K

$0.06

$0.24

Dec 2024

Mistral Large 2407

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-240…

131K

Nov 2024

Qwen 2.5 Coder 32B Instruct

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

16K

$0.8

Nov 2024

Qwen2.5 Coder 32B Instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models…

128K

$0.66

Nov 2024

Llama 3.1 Nemotron 70B Instruct HF

nvidia chat model. https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruc…

33K

$0.88

Nov 2024

TheDrummer: UnslopNemo 12B

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed…

33K

$0.4

Nov 2024

Magnum v4 72B

This is a series of models designed to replicate the prose quality of the Claud…

33K

Oct 2024

Qwen: Qwen2.5 7B Instruct

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings t…

131K

$0.04

$0.1

Oct 2024

Qwen2.5 7B Instruct Turbo

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

33K

$0.3

Oct 2024

Qwen2.5 72B Instruct Turbo

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

131K

$1.2

Oct 2024

Inflection: Inflection 3 Productivity

Inflection 3 Productivity is optimized for following instructions. It is better…

$2.5

$10

Oct 2024

Inflection: Inflection 3 Pi

Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional…

$2.5

$10

Oct 2024

Aya Expanse 32B

Open-weights multilingual research model covering 23 languages. 128K context. T…

128K

$0.5

$1.5

Oct 2024

TheDrummer: Rocinante 12B

Rocinante 12B is designed for engaging storytelling and rich prose. Early teste…

66K

$0.25

$0.5

Sep 2024

Meta: Llama 3.2 3B Instruct (free) ·

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimi…

131K

Sep 2024

Meta: Llama 3.2 1B Instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently per…

131K

$0.03

$0.2

Sep 2024

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed…

131K

$0.35

Sep 2024

Qwen2.5 72B Instruct

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings…

131K

$0.36

$0.4

Sep 2024

Cohere: Command R+ (08-2024)

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher t…

128K

$2.5

$10

Aug 2024

Cohere: Command R (08-2024)

command-r-08-2024 is an update of the Command R with improved performance for m…

128K

$0.15

$0.6

Aug 2024

Sao10K: Llama 3.1 Euryale 70B v2.2

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It i…

131K

$0.85

Aug 2024

Nous: Hermes 3 70B Instruct

Hermes 3 is a generalist language model with many improvements over Hermes 2, i…

131K

$0.7

Aug 2024

Nous: Hermes 3 405B Instruct (free) ·

Hermes 3 is a generalist language model with many improvements over Hermes 2, i…

131K

Aug 2024

Sao10K: Llama 3 8B Lunaris

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It…

$0.04

$0.05

Aug 2024

Meta: Llama 3.1 8B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & fla…

131K

$0.05

$0.08

Jul 2024

Meta: Llama 3.1 70B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & fla…

131K

$0.4

Jul 2024

[Meta] Llama 3.1 · 8B Instant

Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K co…

131K

$0.05

$0.08

Jul 2024

Meta Llama 3.1 70B Instruct Turbo

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

131K

$0.88

Jul 2024

Mistral: Mistral Nemo

A 12B parameter model with a 128k token context length built by Mistral in coll…

131K

$0.02

$0.03

Jul 2024

open-mistral-nemo-2407

Our best multilingual open source model released July 2024.

131K

$0.15

Jul 2024

open-mistral-nemo

Our best multilingual open source model released July 2024.

131K

$0.15

Jul 2024

GPT-4o mini

Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more c…

128K

$0.15

$0.6

Jul 2024

Google: Gemma 2 27B

Gemma 2 27B by Google is an open model built from the same research and technol…

$0.65

Jul 2024

Qwen 2 Instruct (1.5B)

Qwen chat model. https://huggingface.co/Qwen/Qwen2-72B-Instruct

33K

$0.02

Jun 2024

Mistral (7B) Instruct v0.3

mistralai chat model. https://huggingface.co/api/models/mistralai/Mistral-7B-In…

33K

$0.2

May 2024

GPT-4o

Snapshot of gpt-4o from November 20th, 2024.

128K

$2.5

$10

May 2024

Meta: Llama 3 8B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavo…

$0.14

Apr 2024

Meta Llama 3 8B Instruct Reference

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

$0.2

Apr 2024

Meta Llama 3 8B Instruct

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

$0.2

Apr 2024

Mistral: Mixtral 8x22B Instruct

Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B ac…

66K

Apr 2024

WizardLM-2 8x22B

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates…

66K

$0.62

Apr 2024

GPT-4 Turbo

GPT-4 Turbo with Vision model. Vision requests can now use JSON mode and functi…

128K

$10

$30

Apr 2024

Claude Haiku 3 [Retired]

Fast and compact model for near-instant responsiveness. Retired April 20, 2026.…

200K

$0.25

$1.25

Mar 2024

Anthropic: Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant r…

200K

$0.25

$1.25

Mar 2024

Mistral Large

This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-24…

128K

Feb 2024

Deepseek Coder 33B Instruct

Deepseek chat model. https://huggingface.co/api/models/deepseek-ai/deepseek-cod…

16K

$0.8

Feb 2024

V1 8K

Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.

$0.2

Feb 2024

V1 32K

Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.

33K

Feb 2024

V1 128K

Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.

131K

Feb 2024

OpenAI: GPT-4 Turbo Preview

The preview GPT-4 model with improved instruction following, JSON mode, reprodu…

128K

$10

$30

Jan 2024

OpenAI: GPT-3.5 Turbo (older v0613)

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural…

Jan 2024

3.5-Turbo

The latest GPT-3.5 Turbo model with higher accuracy at responding in requested…

16K

$0.5

$1.5

Jan 2024

3.5-Turbo

deprecated

The latest GPT-3.5 Turbo model with higher accuracy at responding in requested…

16K

$0.5

$1.5

Jan 2024

Nous Hermes 2 Mixtral 8X7B Dpo

Nousresearch chat model. https://huggingface.co/api/models/NousResearch/Nous-He…

33K

$0.6

Jan 2024

Mixtral-8x7B Instruct v0.1

mistralai chat model. https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0…

33K

$0.6

Dec 2023

mistral-medium

Official mistral-medium-latest Mistral AI model

262K

$0.4

Dec 2023

Auto Router

Your prompt will be processed by a meta-model and routed to one of dozens of mo…

Nov 2023

3.5-Turbo

deprecated

GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducibl…

16K

Nov 2023

OpenAI: GPT-3.5 Turbo Instruct

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and om…

$1.5

Sep 2023

Mistral (7B) Instruct v0.1

mistralai chat model. https://huggingface.co/api/models/mistralai/Mistral-7B-In…

33K

$0.2

Sep 2023

OpenAI: GPT-3.5 Turbo 16k

This model offers four times the context length of gpt-3.5-turbo, allowing it t…

16K

Aug 2023

Mancer: Weaver (alpha)

An attempt to recreate Claude-style verbosity, but don't expect the same level…

$0.5

$0.75

Aug 2023

ReMM SLERP 13B

A recreation trial of the original MythoMax-L2-B13 but with updated models. #me…

$0.45

$0.65

Jul 2023

MythoMax 13B

One of the highest performing and most popular fine-tunes of Llama 2 13B, with…

$0.06

Jul 2023

GPT-4

deprecated

Snapshot of gpt-4 from June 13th 2023 with improved function calling support. D…

$30

$60

Jun 2023

GPT-4

Snapshot of gpt-4 from June 13th 2023 with improved function calling support. D…

$30

$60

Jun 2023

AI21 Labs Jamba 1.5 Large

AI21 Labs model via Unsupported API (Bedrock Foundation Model)

AI21 Labs Jamba 1.5 Mini

AI21 Labs model via Unsupported API (Bedrock Foundation Model)

Amazon Nova 2 Lite · Global

Amazon model via Converse API (Bedrock Inference Profile)

Amazon Nova Lite

Amazon model via Converse API (Bedrock Foundation Model)

Amazon Nova Micro · US

Amazon model via Converse API (Bedrock Inference Profile)

Amazon Nova Premier · US

Amazon model via Converse API (Bedrock Inference Profile)

Amazon Nova Pro · US

Amazon model via Converse API (Bedrock Inference Profile)

10K

Anthropic Claude 3 Sonnet · US

Anthropic model (Bedrock Inference Profile)

200K

Anthropic Claude Haiku 4 5

Anthropic model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Aya Vision 32B

Open-weights multilingual vision research model (23 languages) with image under…

16K

$0.5

$1.5

Cohere Command R

Cohere model via Unsupported API (Bedrock Foundation Model)

Cohere Command R+

Cohere model via Unsupported API (Bedrock Foundation Model)

Cohere Embed v4 · Global

Cohere model via Converse API (Bedrock Inference Profile)

Deepseek DeepSeek-R1 · US

Deepseek model via Converse API (Bedrock Inference Profile)

Gemma 4 12B It

Google chat model. https://huggingface.co/google/gemma-4-12B-it

262K

GLM 4.6

Zai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Google Gemma 3 12B IT

Google model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

Google Gemma 3 27B PT

Google model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

Google Gemma 3 4B IT

Google model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

Google Gemma 4 26b A4b

Google model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Google Gemma 4 31b

Google model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Google Gemma 4 E2b

Google model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

GPT-OSS 120B

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

GPT-OSS 20B

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Kimi K2 Thinking

Moonshotai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Kimi K2.5 Fp4

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

262K

$0.5

$2.8

Labs Leanstral 1 5 1

A mid & post-trained version of mistral small 4 for Lean (260618 SFT)

262K

labs-leanstral-1-5

A mid & post-trained version of mistral small 4 for Lean (260618 SFT)

262K

LFM2-24B-A2B

Togethercomputer chat model.

33K

$0.03

$0.12

LFM2.5-8B-A1B

LiquidAI chat model. https://huggingface.co/api/models/LiquidAI/LFM2.5-8B-A1B

128K

$0.03

$0.12

Meta Llama 3 70B Instruct

Meta model via Converse API (Bedrock Foundation Model)

Meta Llama 3 70B Instruct Turbo

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

$0.88

Meta Llama 3 8B Instruct

Meta model via Converse API (Bedrock Foundation Model)

Meta Llama 3 8B Instruct Lite

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

$0.14

Meta Llama 3.1 70B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 3.1 8B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 3.2 11B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 3.2 1B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 3.2 3B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 3.2 90B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 3.3 70B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 4 Maverick 17B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

Meta Llama 4 Scout 17B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

MiniMax M2

MiniMax model via OpenAI-Compatible API (Bedrock Foundation Model)

410K

MiniMax M2.1

MiniMax model via OpenAI-Compatible API (Bedrock Foundation Model)

197K

MiniMax M2.5

MiniMax model via OpenAI-Compatible API (Bedrock Foundation Model)

197K

Mistral AI Devstral 2 123B

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Mistral AI Magistral Small 2509

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

Mistral AI Ministral 14B 3.0

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Mistral AI Ministral 3 8B

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Mistral AI Ministral 3B

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Mistral AI Mistral 7B Instruct

Mistral AI model via Converse API (Bedrock Foundation Model)

Mistral AI Mistral Large (24.02)

Mistral AI model via Converse API (Bedrock Foundation Model)

Mistral AI Mistral Large 3

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Mistral AI Mistral Small (24.02)

Mistral AI model via Converse API (Bedrock Foundation Model)

Mistral AI Mixtral 8x7B Instruct

Mistral AI model via Converse API (Bedrock Foundation Model)

Mistral AI Voxtral Mini 3B 2507

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

33K

Mistral AI Voxtral Small 24B 2507

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

33K

Mistral Pixtral Large 25.02 · US

Mistral model via Converse API (Bedrock Inference Profile)

mistral-code-agent-latest

Official devstral-2512 Mistral AI model

262K

mistral-code-fim-latest

Our cutting-edge language model for coding released August 2025.

256K

mistral-code-latest

Our cutting-edge language model for coding released August 2025.

256K

mistral-tiny-latest

Our best multilingual open source model released July 2024.

131K

mistral-vibe-cli-fast

Mistral Small 4.

262K

mistral-vibe-cli-with-tools

Official mistral-medium-latest Mistral AI model

262K

Moonshot AI Kimi K2 Thinking

Moonshot AI model via Converse API (Bedrock Foundation Model)

262K

Moonshot AI Kimi K2.5

Moonshot AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

North Mini Code

Compact agentic coding model from Cohere's North platform. Reasoning and tool u…

436K

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

NVIDIA Nemotron Nano 12B v2 VL BF16

NVIDIA model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

NVIDIA Nemotron Nano 3 30B

NVIDIA model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Open Mistral Nemo

Our best multilingual open source model released July 2024.

131K

Openai Gpt 5.4

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Openai Gpt 5.5 2026 04 23

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Openai Gpt 5.6 Luna

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Openai Gpt 5.6 Sol

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Openai Gpt 5.6 Terra

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

OpenAI GPT OSS Safeguard 120B

OpenAI model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

OpenAI GPT OSS Safeguard 20B

OpenAI model via OpenAI-Compatible API (Bedrock Foundation Model)

131K

OpenAI gpt-oss-120b

OpenAI model via Converse API (Bedrock Foundation Model)

128K

OpenAI gpt-oss-20b

OpenAI model via Converse API (Bedrock Foundation Model)

128K

Qvq Max

Alibaba model (not yet curated).

131K

Qwen Coder Plus

Alibaba model (not yet curated).

131K

Qwen Flash

Fast and very low cost with hybrid thinking. 1M context.

$0.05

$0.4

Qwen Max

Best quality of the stable commercial line. 32K context.

33K

$1.6

$6.4

Qwen Plus

Balanced quality, speed, and cost with hybrid thinking. 1M context.

$0.4

$1.2

Qwen Turbo

Fastest and cheapest for simple tasks. 1M context.

$0.05

$0.2

Qwen Vl Max

Alibaba model (not yet curated).

131K

Qwen Vl Plus

Alibaba model (not yet curated).

131K

Qwen3 235B A22B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Qwen3 235b A22b Instruct 2507

Alibaba model (not yet curated).

131K

Qwen3 32B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Qwen3 32B (dense)

Qwen model via Converse API (Bedrock Foundation Model)

33K

Qwen3 Coder 30B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Qwen3 Coder 480B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Qwen3 Coder 480b A35b Instruct

Alibaba model (not yet curated).

131K

Qwen3 Coder Next

Qwen model via OpenAI-Compatible API (Bedrock Foundation Model)

262K

Qwen3 Coder Next Fp8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-Coder-Next-FP8

262K

$0.5

$1.2

Qwen3 Max Preview

Alibaba model (not yet curated).

131K

Qwen3 Next 80B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Qwen3 Next 80B A3B

Qwen model via Converse API (Bedrock Foundation Model)

262K

Qwen3 VL 235B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Qwen3 VL 235B A22B

Qwen model via Converse API (Bedrock Foundation Model)

262K

Qwen3 Vl Flash 2025 10 15

Alibaba model (not yet curated).

131K

Qwen3-Coder-30B-A3B-Instruct

Qwen model via Converse API (Bedrock Foundation Model)

262K

Qwen3.5 35B A3B Lora

Qwen chat model.

262K

Qwen3.5 Flash 2026 02 23

Alibaba model (not yet curated).

131K

Qwq Plus 2025 03 05

Alibaba model (not yet curated).

131K

Ternary Bonsai 27B

Prism ML chat model. https://huggingface.co/api/models/prism-ml/Ternary-Bonsai-…

262K

tiny aya earth

New Cohere Model

128K

tiny aya fire

New Cohere Model

128K

Tiny Aya Global

Tiny multilingual research model. 8K context. Text only.

tiny aya water

New Cohere Model

128K

Twelvelabs TwelveLabs Marengo Embed 3.0 · US

Twelvelabs model via Converse API (Bedrock Inference Profile)

Twelvelabs TwelveLabs Marengo Embed v2.7 · US

Twelvelabs model via Converse API (Bedrock Inference Profile)

Twelvelabs TwelveLabs Pegasus v1.2 · Global

Twelvelabs model via Converse API (Bedrock Inference Profile)

Writer Palmyra Vision 7B

Writer model via OpenAI-Compatible API (Bedrock Foundation Model)

Writer Palmyra X4 · US

Writer model via Converse API (Bedrock Inference Profile)

Writer Palmyra X5 · US

Writer model via Converse API (Bedrock Inference Profile)

Xai Grok 4.3

Xai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K

Z.AI GLM 4.7 Flash

Z.AI model via OpenAI-Compatible API (Bedrock Foundation Model)

203K

Z.AI GLM 5

Z.AI model via OpenAI-Compatible API (Bedrock Foundation Model)

203K

Meituan: LongCat 2.0

NEW

Jul 2026

LongCat 2.0 is a sparse mixture-of-experts language model from Meituan, with 48…

1M · in $0.3 · out $1.2

Thinking Machines: Inkling

NEW

Jul 2026

Inkling is an open-weight multimodal mixture-of-experts model from Thinking Mac…

1M · in $1 · out $4.05

Auto Router (Beta)

NEW

Jul 2026

Auto Router (Beta) is a task-aware router from OpenRouter. It classifies each r…

2M · in - · out -

Kimi K3

NEWHOT

Jul 2026

Native multimodal flagship (text, image, video inputs) with thinking on by defa…

1M · in $3 · out $15

Meta: Muse Spark 1.1

NEW

Jul 2026

Muse Spark 1.1 is a multimodal reasoning model from Meta, built for agentic tas…

1M · in $1.25 · out $4.25

Kwaipilot: KAT-Coder-Pro V2.5

NEW

Jul 2026

KAT-Coder-Pro V2.5 is a flagship-level Agentic Coding model that can directly h…

256K · in $0.74 · out $2.96

Kwaipilot: KAT-Coder-Air V2.5 (free) ·

NEW

Jul 2026

KAT-Coder-Air V2.5 is a flagship-level Agentic Coding model that can directly h…

256K · in - · out -

OpenAI: GPT-5.6 Terra Pro

NEW

Jul 2026

GPT-5.6 Terra Pro is the same underlying model as GPT-5.6 Terra, served with `r…

1.1M · in $2.5 · out $15

OpenAI: GPT-5.6 Sol Pro

NEW

Jul 2026

GPT-5.6 Sol Pro is the same underlying model as GPT-5.6 Sol, served with `reaso…

1.1M · in $5 · out $30

OpenAI: GPT-5.6 Luna Pro

NEW

Jul 2026

GPT-5.6 Luna Pro is the same underlying model as GPT-5.6 Luna, served with `rea…

1.1M · in $1 · out $6

xAI: Grok Latest

NEW

Jul 2026

This model always redirects to the latest Grok model from xAI.

500K · in $2 · out $6

Grok 4.5

NEWHOT

Jul 2026

xAI's smartest and fastest model with frontier performance on coding, knowledge…

500K · in $2 · out $6

AionLabs: Aion-3.0-Mini

NEW

Jul 2026

Aion-3.0 Mini is a multi-model roleplaying and storytelling system from AionLab…

131K · in $0.7 · out $1.4

AionLabs: Aion-3.0

NEW

Jul 2026

Aion-3.0 is a multi-model roleplaying and storytelling system from AionLabs, bu…

131K · in $3 · out $6

Tencent: Hy3

NEWHOT

Jul 2026

Hy3 is a 295B-parameter Mixture-of-Experts model from Tencent (21B active, 192…

262K · in $0.14 · out $0.58

Poolside: Laguna XS 2.1 (free) ·

NEW

Jul 2026

Laguna XS 2.1 is the latest coding agent model in the 33B-A3B category from Poo…

262K · in - · out -

Nano Banana 2 Lite

NEW

Jun 2026

Gemini 3.1 Flash Lite Image.

131K · in $0.25 · out $1.5

Gemma 4 31B (Preview)

NEW

Jun 2026

Google Gemma 4 31B on Cerebras - first multimodal model on wafer-scale inferenc…

131K · in $0.99 · out $1.49

Gemini Omni Flash Preview (video)

NEW

Jun 2026

Gemini Omni Flash Preview

197K · in $1.5 · out $17.5

Qwen3.6 35B A3B Lora

NEW

Jun 2026

Qwen chat model.

262K · in - · out -

Qwen3.5 2B Lora

NEW

Jun 2026

Qwen chat model.

262K · in - · out -

Claude Sonnet 5 · US

NEW

Jun 2026

Best combination of speed and intelligence, with the largest gains in coding an…

1M · in $2 · out $10

Claude Sonnet 5

NEWHOT

Jun 2026

Best combination of speed and intelligence, with the largest gains in coding an…

1M · in $2 · out $10

GPT-5.6 Terra

NEWHOT

Jun 2026

Balanced model for efficient, high-volume everyday work. Competitive with GPT-5…

1.1M · in $2.5 · out $15

GPT-5.6 Sol

NEWHOT

Jun 2026

Flagship next-generation model. Strongest yet for agentic coding, science, and…

1.1M · in $5 · out $30

GPT-5.6 Luna

NEWHOT

Jun 2026

Fastest, most affordable GPT-5.6 model for high-volume work. Strong capability…

1.1M · in $1 · out $6

GLM-5.2 (Alibaba)

NEW

Jun 2026

Zhipu GLM-5.2 served via Alibaba Model Studio. 1M context, thinking.

1M · in $1.1 · out $3.85

Sakana: Fugu Ultra

NEW

Jun 2026

Fugu Ultra is the higher-performance model in Sakana AI's Fugu family. Rather t…

1M · in $5 · out $30

Nex AGI: Nex-N2-Mini

NEW

Jun 2026

Nex-N2-Mini is an open-source agentic mixture-of-experts model from Nex AGI, th…

262K · in $0.03 · out $0.1

Sakana Fugu Cyber

NEW

Jun 2026

Orchestrator specialized for cybersecurity reasoning: security analysis, vulner…

1M · in $6 · out $36

Sakana Fugu

NEW

Jun 2026

Fast orchestration model routing tasks across a swappable pool of frontier LLMs…

1M · in - · out -

Qwen3.7 Max

NEW

Jun 2026

Flagship agent model with native extended thinking and 1M context. Text-only; s…

1M · in $2.5 · out $7.5

Cohere: North Mini Code (free) ·

NEW

Jun 2026

North Mini Code is Cohere's first agentic coding model and the debut of its Nor…

256K · in - · out -

OpenRouter: Fusion

NEW

Jun 2026

Fusion turns your prompt into a small multi-model deliberation. A panel of expe…

1M · in - · out -

GLM-5.2 (1M)

NEWHOT

Jun 2026

Z.ai 1M-context flagship (744B MoE, 40B activated). Agentic coding with reasoni…

1M · in $1.4 · out $4.4

Claude Fable 5 · Global

NEW

Jun 2026

Most capable widely released model for the most demanding reasoning and long-ho…

1M · in $10 · out $50

Claude Fable 5

NEWHOT

Jun 2026

Most capable widely released model for the most demanding reasoning and long-ho…

1M · in $10 · out $50

Anthropic: Claude Fable Latest

NEW

Jun 2026

This model always redirects to the latest model in the Claude Fable family.

1M · in $10 · out $50

Nex AGI: Nex-N2-Pro

NEW

Jun 2026

Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active…

262K · in $0.25 · out $1

NVIDIA: Nemotron 3.5 Content Safety (free) ·

Jun 2026

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardra…

128K · in - · out -

NVIDIA: Nemotron 3 Ultra

HOT

Jun 2026

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model f…

1M · in $0.6 · out $3.6

Llama 4 Maverick 17B 128E Instruct Nvfp4

Jun 2026

Meta chat model. https://huggingface.co/api/models/RedHatAI/Llama-4-Maverick-17…

1M · in - · out -

GLM 4.7 FP4

Jun 2026

Zai Org chat model.

203K · in - · out -

Qwen3.7 Plus

Jun 2026

Multimodal agent model with 1M context, native thinking, and vision/video under…

1M · in $0.4 · out $1.6

Kimi K2.7 Code Highspeed

Jun 2026

High-speed code variant with ~180 tok/s output (up to 260 in short contexts). N…

262K · in $1.9 · out $8

Kimi K2.7 Code

Jun 2026

Code-focused multimodal model (text, image, video inputs) with always-on thinki…

262K · in $0.95 · out $4

MiniMax: MiniMax M3

May 2026

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, ima…

1M · in $0.3 · out $1.2

StepFun: Step 3.7 Flash

May 2026

Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Expert…

256K · in $0.2 · out $1.15

Nano Banana Pro

May 2026

Gemini 3 Pro Image

164K · in $2 · out $12

Nano Banana 2

May 2026

Gemini 3.1 Flash Image.

131K · in $0.5 · out $3

Claude Opus 4.8 · US

May 2026

Most capable Opus-tier model for complex reasoning and agentic coding (Bedrock…

1M · in $5 · out $25

Claude Opus 4.8

HOT

May 2026

Most capable Opus-tier model for complex reasoning and agentic coding

1M · in $5 · out $25

Anthropic: Claude Opus 4.8 (Fast)

May 2026

Fast-mode variant of Opus 4.8 - identical capabilities with higher output speed…

1M · in $10 · out $50

Qwen3.7 Max

May 2026

Flagship agent model with native extended thinking and 1M context. Text-only; s…

1M · in $2.5 · out $7.5

Grok Build 0.1

May 2026

xAI fast coding model with reasoning, function calling, and structured outputs.…

256K · in $1 · out $2

Llama 4 Scout 17B 16E Instruct Fp8 Lora

May 2026

Meta chat model.

10M · in - · out -

Gemini 3.5 Flash

HOT

May 2026

Gemini 3.5 Flash

1.1M · in $1.5 · out $9

Antigravity Agent Preview (2026-05)

May 2026

Preview release of Antigravity Agent (05-2026)

197K · in $1.5 · out $9

Gemma 4 31B It Lora

May 2026

Google chat model. https://huggingface.co/api/models/google/gemma-4-31B-it

262K · in - · out -

Gemma 3 27B It Lora

May 2026

Google chat model.

- · in - · out -

Perceptron: Perceptron Mk1

May 2026

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model…

33K · in $0.15 · out $1.5

Anthropic: Claude Opus 4.7 (Fast)

May 2026

Fast-mode variant of Opus 4.7 - identical capabilities with higher output speed…

1M · in $30 · out $150

inclusionAI: Ring-2.6-1T

May 2026

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters,…

262K · in $0.08 · out $0.63

Mixtral 8x7B Instruct V0.1 FP8 Lora

May 2026

Mistral AI chat model.

33K · in - · out -

Gemma 3 270M It Lora

May 2026

Google chat model.

33K · in - · out -

Gemini 3.1 Flash-Lite

May 2026

Gemini 3.1 Flash Lite

1.1M · in $0.25 · out $1.5

Llama 3.3 70B Instruct FP8 Lora

May 2026

Meta chat model.

131K · in - · out -

OpenAI: GPT Chat Latest

May 2026

GPT Chat Latest

400K · in $5 · out $30

Command A Plus

May 2026

Cohere flagship. Agentic reasoning with vision, tool use, and long-context RAG.…

436K · in - · out -

Qwen3 VL Plus

Apr 2026

Current vision-language model with strong visual reasoning and thinking. Tiered…

262K · in $0.2 · out $1.6

mistral-medium-3.5

Apr 2026

Official mistral-medium-latest Mistral AI model

262K · in $1.5 · out $7.5

IBM: Granite 4.1 8B

Apr 2026

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from…

131K · in $0.05 · out $0.1

Poolside: Laguna XS.2

Apr 2026

Laguna XS.2 is the second-generation model in the XS size class from Poolside,…

262K · in $0.1 · out $0.2

Poolside: Laguna M.1

Apr 2026

Laguna M.1 is the flagship coding agent model from Poolside, optimized for comp…

262K · in $0.2 · out $0.4

Owl Alpha ·

Apr 2026

Owl Alpha is a high-performance foundation model designed for agentic workloads…

1M · in - · out -

NVIDIA: Nemotron 3 Nano Omni (free) ·

Apr 2026

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to fun…

256K · in - · out -

Nemotron 3 Nano Omni 30B A3b Reasoning Fp8

Apr 2026

Nvidia chat model.

131K · in - · out -

mistral-medium-latest

Apr 2026

Official mistral-medium-latest Mistral AI model

262K · in $1.5 · out $7.5

Mistral Medium (latest)

Apr 2026

Official mistral-medium-latest Mistral AI model

262K · in $1.5 · out $7.5

Qwen3.6 Max Preview

Apr 2026

Alibaba model (not yet curated).

131K · in $1.04 · out $6.24

Qwen3.6 Flash

HOT

Apr 2026

Fast, cost-effective multimodal model with 1M context, near-flagship quality, v…

1M · in $0.25 · out $1.5

Qwen3.6 35b A3b

Apr 2026

Alibaba model (not yet curated).

131K · in $0.14 · out $1

Qwen3.6 27b

Apr 2026

Alibaba model (not yet curated).

131K · in $0.45 · out $2.7

Qwen3.5 Plus 2026 02 15

Apr 2026

Alibaba model (not yet curated).

131K · in $0.3 · out $1.8

OpenAI GPT Mini Latest

Apr 2026

This model always redirects to the latest model in the OpenAI GPT Mini family.

400K · in $0.75 · out $4.5

OpenAI GPT Latest

Apr 2026

This model always redirects to the latest model in the OpenAI GPT family.

1.1M · in $5 · out $30

MoonshotAI Kimi Latest

Apr 2026

This model always redirects to the latest model in the MoonshotAI Kimi family.

1M · in $3 · out $15

Google Gemini Pro Latest

Apr 2026

This model always redirects to the latest model in the Google Gemini Pro family.

1M · in $2 · out $12

Google Gemini Flash Latest

HOT

Apr 2026

This model always redirects to the latest model in the Google Gemini Flash fami…

1M · in $1.5 · out $9

Anthropic Claude Sonnet Latest

Apr 2026

This model always redirects to the latest model in the Anthropic Claude Sonnet…

1M · in $2 · out $10

Anthropic Claude Haiku Latest

Apr 2026

This model always redirects to the latest model in the Anthropic Claude Haiku f…

200K · in $1 · out $5

DeepSeek V4 Pro

HOT

Apr 2026

Premium reasoning model with 1M context. Supports extended thinking modes, JSON…

1M · in $0.44 · out $0.87

DeepSeek V4 Flash

HOT

Apr 2026

Fast general-purpose model with 1M context. Supports extended thinking modes, J…

1M · in $0.14 · out $0.28

inclusionAI: Ling-2.6-1T

Apr 2026

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s t…

262K · in $0.08 · out $0.63

GPT-5.5 Pro

Apr 2026

Most capable model for complex tasks. Uses more compute for smarter, more preci…

1.1M · in $30 · out $180

GPT-5.5

HOT

Apr 2026

New baseline for complex production workflows. Stronger task execution, more pr…

1.1M · in $5 · out $30

Xiaomi: MiMo-V2.5-Pro

Apr 2026

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in gene…

1M · in $0.44 · out $0.87

Xiaomi: MiMo-V2.5

Apr 2026

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic…

1M · in $0.14 · out $0.28

Tencent: Hy3 preview

Apr 2026

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed…

262K · in $0.06 · out $0.21

Qwen3.6 35B A3b Fp8

Apr 2026

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3.6-35B-A3B-FP8

262K · in - · out -

Pareto Code Router

Apr 2026

The Pareto Router maintains a tiered shortlist of strong coding models, ranked…

2M · in - · out -

OpenAI: GPT-5.4 Image 2

Apr 2026

GPT-5.4 Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image gen…

272K · in $8 · out $15

inclusionAI: Ling-2.6-flash

Apr 2026

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total…

262K · in $0.01 · out $0.03

Gemma 4 E2B-it

Apr 2026

Google chat model. https://huggingface.co/api/models/google/gemma-4-E2B-it

131K · in - · out -

Deep Research Preview (2026-04)

Apr 2026

Preview release (April 21th, 2026) of Deep Research

197K · in $1.25 · out $10

Deep Research Max Preview (2026-04)

HOT

Apr 2026

Preview release (April 21st, 2026) of Deep Research Max

197K · in $1.25 · out $10

Anthropic: Claude Opus Latest

Apr 2026

This model always redirects to the latest model in the Claude Opus family.

1M · in $5 · out $25

Kimi K2.6

Apr 2026

Native multimodal flagship (text, image, video inputs) with thinking and non-th…

262K · in $0.95 · out $4

Grok 4.3

Apr 2026

xAI's latest flagship model with reasoning and a 1M token context window. Suppo…

1M · in $1.25 · out $2.5

Gemma 4 E4B-it

Apr 2026

Google chat model. https://huggingface.co/google/gemma-4-E4B-it

131K · in - · out -

Claude Opus 4.7 · Global

Apr 2026

Previous most capable model for complex reasoning and agentic coding (Bedrock I…

1M · in $5 · out $25

Claude Opus 4.7

Apr 2026

Previous most capable model for complex reasoning and agentic coding

1M · in $5 · out $25

Gemini 3.1 Flash TTS Preview

Apr 2026

Gemini 3.1 Flash TTS Preview

25K · in $1 · out -

Gemini Robotics-ER 1.6 Preview

Apr 2026

Gemini Robotics-ER 1.6 Preview

197K · in $1 · out $5

Nvidia Nemotron 3 Super 120B A12b Bf16

Apr 2026

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-3-S…

262K · in - · out -

GLM-5.1

Apr 2026

Z.ai flagship (744B MoE, 40B activated). Post-training upgrade over GLM-5 with…

205K · in $1.4 · out $4.4

Qwen3.6 Plus

Apr 2026

Alibaba model (not yet curated).

131K · in $0.33 · out $1.95

Gemma 4 31B IT

HOT

Apr 2026

Gemma 4 31B IT

295K · in - · out -

Gemma 4 26B A4B IT

HOT

Apr 2026

Gemma 4 26B A4B IT

295K · in - · out -

GLM-5V Turbo

Apr 2026

First multimodal GLM-5 model. Vision-based coding agent with image/video/file i…

205K · in $1.2 · out $4

Arcee AI: Trinity Large Thinking

Apr 2026

Trinity Large Thinking is a powerful open source reasoning model from the team…

262K · in $0.25 · out $0.8

xAI: Grok 4.20 Multi-Agent

Mar 2026

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborativ…

2M · in $1.25 · out $2.5

xAI: Grok 4.20

HOT

Mar 2026

Grok 4.20 is a reasoning model from xAI with industry-leading speed and agentic…

2M · in $1.25 · out $2.5

Holo3 35B A3b

Mar 2026

Hcompany chat model. https://huggingface.co/api/models/Hcompany/Holo3-35B-A3B

262K · in - · out -

Google: Lyria 3 Pro Preview ·

Mar 2026

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of m…

1M · in - · out -

Google: Lyria 3 Clip Preview ·

Mar 2026

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's fami…

1M · in - · out -

Kwaipilot: KAT-Coder-Pro V2

Mar 2026

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder se…

256K · in $0.3 · out $1.2

Qwen3 30B A3B Instruct 2507 Lora

Mar 2026

Qwen chat model.

262K · in - · out -

Deepseek V3.1 NVFP4

Mar 2026

DeepSeek chat model. https://huggingface.co/deepseek-ai/DeepSeek-V3.1

131K · in $0.6 · out $1.7

Reka Edge

Mar 2026

Reka Edge is an extremely efficient 7B multimodal vision-language model that ac…

16K · in $0.1 · out $0.1

Qwen3 8B Lora

Mar 2026

Qwen chat model.

41K · in - · out -

MiniMax: MiniMax M2.7

Mar 2026

MiniMax-M2.7 is a next-generation large language model designed for autonomous,…

205K · in $0.25 · out $1

GPT-5.4 Nano

Mar 2026

Cheapest GPT-5.4-class model for simple high-volume tasks like classification a…

400K · in $0.2 · out $1.25

GPT-5.4 Mini

Mar 2026

Strongest mini model for coding, computer use, and subagents. GPT-5.4-class int…

400K · in $0.75 · out $4.5

Qwen3.5 122B A10b Fp8

Mar 2026

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3.5-122B-A10B-FP8

262K · in - · out -

mistral-small-latest

Mar 2026

Mistral Small 4.

262K · in $0.15 · out $0.6

Mistral Small (2603)

Mar 2026

Mistral Small 4.

262K · in $0.15 · out $0.6

Leanstral (2603)

Mar 2026

A mid & post-trained version of mistral small 4 for Lean

197K · in - · out -

GLM-5 Turbo

Mar 2026

Speed-optimized GLM-5 variant for agent workflows. Enhanced tool invocation and…

205K · in $1.2 · out $4

Deepseek OCR 2

Mar 2026

Deepseek chat model. https://huggingface.co/api/models/deepseek-ai/DeepSeek-OCR…

8K · in - · out -

NVIDIA: Nemotron 3 Super

Mar 2026

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating j…

1M · in $0.08 · out $0.45

Nvidia Nemotron 3 Super 120B A12b Fp8

Mar 2026

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-3-S…

262K · in - · out -

Qwen: Qwen3.5-9B

Mar 2026

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed t…

262K · in $0.1 · out $0.15

ByteDance Seed: Seed-2.0-Lite

Mar 2026

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers…

262K · in $0.25 · out $2

Grok 4.20 Reasoning

Mar 2026

xAI flagship reasoning model with a 1M token context window. Deep reasoning and…

1M · in $1.25 · out $2.5

Grok 4.20 Multi-Agent

Mar 2026

Multi-agent model that runs specialized agents in parallel for collaborative ve…

1M · in $1.25 · out $2.5

Grok 4.20

HOT

Mar 2026

xAI flagship model with a 1M token context window. Non-reasoning variant for fa…

1M · in $1.25 · out $2.5

Qwen3.5 9B Fp8

Mar 2026

Qwen chat model. https://huggingface.co/api/models/togethercomputer/Qwen3.5-9B-…

262K · in - · out -

GPT-5.4 Pro

Mar 2026

Most capable model for complex tasks. Uses more compute for smarter, more preci…

1.1M · in $30 · out $180

GPT-5.4

Mar 2026

Most capable and efficient frontier model for professional work. Native compute…

1.1M · in $2.5 · out $15

Inception: Mercury 2

Mar 2026

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion…

128K · in $0.25 · out $0.75

OpenAI: GPT-5.3 Chat

Mar 2026

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conv…

128K · in $1.75 · out $14

GPT-5.3 Instant

deprecated

Mar 2026

GPT-5.3 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.

128K · in $1.75 · out $14

Glm 4.7 Fp8

Mar 2026

Zai Org chat model. https://huggingface.co/api/models/zai-org/GLM-4.7-FP8

203K · in - · out -

Gemini 3.1 Flash-Lite Preview

Mar 2026

Gemini 3.1 Flash Lite Preview

1.1M · in $0.25 · out $1.5

Nano Banana 2 Preview

Feb 2026

Gemini 3.1 Flash Image Preview.

131K · in $0.5 · out $3

ByteDance Seed: Seed-2.0-Mini

Feb 2026

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive s…

262K · in $0.1 · out $0.4

Qwen3.5 35b A3b

Feb 2026

Alibaba model (not yet curated).

131K · in $0.14 · out $1

Qwen3.5 27b

Feb 2026

Alibaba model (not yet curated).

131K · in $0.26 · out $2.6

Qwen3.5 122b A10b

Feb 2026

Alibaba model (not yet curated).

131K · in $0.26 · out $2.08

Qwen: Qwen3.5-Flash

Feb 2026

The Qwen3.5 native vision-language Flash models are built on a hybrid architect…

1M · in $0.07 · out $0.26

LiquidAI: LFM2-24B-A2B

Feb 2026

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures de…

128K · in $0.03 · out $0.12

GPT Audio 1.5

Feb 2026

Best voice model for audio in, audio out with Chat Completions. Accepts audio i…

128K · in $2.5 · out $10

AionLabs: Aion-2.0

Feb 2026

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and…

131K · in $0.8 · out $1.6

Gemini 3.1 Pro Preview (Custom Tools)

Feb 2026

Gemini 3.1 Pro Preview optimized for custom tool usage

1.1M · in $2 · out $12

Gemini 3.1 Pro Preview

HOT

Feb 2026

Gemini 3.1 Pro Preview

1.1M · in $2 · out $12

Claude Sonnet 4.6 · US

Feb 2026

Best combination of speed and intelligence for everyday tasks (Bedrock Inferenc…

1M · in $3 · out $15

Claude Sonnet 4.6

Feb 2026

Best combination of speed and intelligence for everyday tasks

1M · in $3 · out $15

Qwen3.5 397b A17b

Feb 2026

Alibaba model (not yet curated).

131K · in $0.39 · out $2.34

Qwen: Qwen3.5 Plus 2026-02-15

Feb 2026

The Qwen3.5 native vision-language series Plus models are built on a hybrid arc…

1M · in $0.26 · out $1.56

MiniMax M2.5 FP4

Feb 2026

MiniMaxAI chat model.

8K · in - · out -

MiniMax: MiniMax M2.5

Feb 2026

MiniMax-M2.5 is a SOTA large language model designed for real-world productivit…

205K · in $0.15 · out $0.9

GLM 5 Fp4

Feb 2026

Zai Org chat model. https://huggingface.co/api/models/togethercomputer/GLM-5-FP4

203K · in - · out -

GLM-5

Feb 2026

Z.ai flagship foundation model (744B MoE, 40B activated). Designed for Agentic…

205K · in $1 · out $3.2

Qwen: Qwen3 Max Thinking

Feb 2026

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designe…

262K · in $0.78 · out $3.9

GPT-5.3 Codex

Feb 2026

Most capable agentic coding model. Combines frontier coding performance of GPT-…

400K · in $1.75 · out $14

Claude Opus 4.6 · Global

Feb 2026

Previous most intelligent model for complex agents and coding, with adaptive th…

1M · in $5 · out $25

Claude Opus 4.6

Feb 2026

Previous most intelligent model for complex agents and coding, with adaptive th…

1M · in $5 · out $25

Qwen3 Coder Next

Feb 2026

Alibaba model (not yet curated).

131K · in $0.11 · out $0.8

GLM-OCR (Vision, OCR)

Feb 2026

Specialized OCR model for text extraction from images and documents.

131K · in $0.03 · out $0.03

Free Models Router ·

Feb 2026

The simplest way to get free inference. openrouter/free is a router that select…

200K · in - · out -

StepFun: Step 3.5 Flash

Jan 2026

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on…

262K · in $0.1 · out $0.3

Upstage: Solar Pro 3

Jan 2026

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With…

128K · in $0.15 · out $0.6

Kimi K2.5

Jan 2026

Supports vision (images/videos), thinking mode, and Agent tasks. 256K context.

262K · in $0.6 · out $3

MiniMax: MiniMax M2-her

Jan 2026

MiniMax M2-her is a dialogue-first large language model built for immersive rol…

66K · in $0.3 · out $1.2

Writer: Palmyra X5

Jan 2026

Palmyra X5 is Writer's most advanced model, purpose-built for building and scal…

1M · in $0.6 · out $6

LiquidAI: LFM2.5-1.2B-Thinking (free) ·

Jan 2026

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for age…

33K · in - · out -

LiquidAI: LFM2.5-1.2B-Instruct (free) ·

Jan 2026

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model bui…

33K · in - · out -

GLM-4.7 FlashX

Jan 2026

Fast GLM-4.7 variant with priority routing and higher concurrency. Same model a…

131K · in $0.07 · out $0.4

GLM-4.7 Flash (Free)

Jan 2026

Free GLM-4.7 variant. Same model as FlashX but with limited concurrency (1 conc…

131K · in - · out -

Z.AI GLM 4.7

Jan 2026

Z.AI model via OpenAI-Compatible API (Bedrock Foundation Model)

203K · in $2.25 · out $2.75

MiniMax: MiniMax M2.1

Dec 2025

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized…

205K · in $0.3 · out $1.2

ByteDance Seed: Seed 1.6 Flash

Dec 2025

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance See…

262K · in $0.08 · out $0.3

ByteDance Seed: Seed 1.6

Dec 2025

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It inc…

262K · in $0.25 · out $2

GLM-4.7

Dec 2025

Latest-gen GLM model with 128K context. Thinking mode activated by default.

131K · in $0.6 · out $2.2

Gemini 3 Flash Preview

Dec 2025

Gemini 3 Flash Preview

1.1M · in $0.5 · out $3

Nvidia Nemotron 3 Nano 30B A3b Bf16

Dec 2025

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-3-N…

262K · in - · out -

NVIDIA: Nemotron 3 Nano 30B A3B

Dec 2025

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compu…

262K · in $0.05 · out $0.2

OpenAI: GPT-5.2 Chat

Dec 2025

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, o…

128K · in $1.75 · out $14

GPT-5.2 Pro

Dec 2025

Smartest and most trustworthy option for difficult questions. Uses more compute…

400K · in $21 · out $168

GPT-5.2 Instant

deprecated

Dec 2025

GPT-5.2 Instant model, previously powering ChatGPT. Replaced by GPT-5.5 Instant.

128K · in $1.75 · out $14

GPT-5.2 Codex

deprecated

Dec 2025

GPT-5.2 optimized for long-horizon, agentic coding tasks in Codex or similar en…

400K · in $1.75 · out $14

GPT-5.2

Dec 2025

Most capable model for professional work and long-running agents. Improvements…

400K · in $1.75 · out $14

Deep Research Pro Preview

Dec 2025

Preview release (December 12th, 2025) of Deep Research Pro

197K · in $1.25 · out $10

AutoGLM Phone

Dec 2025

Mobile phone automation agent. Understands phone screens via multimodal percept…

131K · in - · out -

Mistral: Devstral 2 2512

Dec 2025

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing i…

262K · in $0.4 · out $2

Devstral 2 (latest)

Dec 2025

Official mistral-medium-latest Mistral AI model

262K · in $0.4 · out $2

Devstral 2 (latest)

Dec 2025

Official devstral-2512 Mistral AI model

262K · in $0.4 · out $2

Devstral 2 (latest)

Dec 2025

Official devstral-2512 Mistral AI model

262K · in $0.4 · out $2

Relace: Relace Search

Dec 2025

The relace-search model uses 4-12 `view_file` and `grep` tools in parallel to e…

256K · in $1 · out $3

GLM-4.6 V FlashX

Dec 2025

Fast vision GLM-4.6 with priority routing and higher concurrency. Image/video/f…

131K · in $0.04 · out $0.4

GLM-4.6 V Flash (Free)

Dec 2025

Free vision GLM-4.6. Same model as FlashX but with limited concurrency (1 concu…

131K · in - · out -

GLM-4.6 V

Dec 2025

Vision-enabled GLM-4.6 model. Supports image/video/file inputs, 32K output, hyb…

131K · in $0.3 · out $0.9

EssentialAI Rnj-1 Instruct

Dec 2025

Essential AI chat model. https://huggingface.co/api/models/togethercomputer/Ess…

33K · in - · out -

Body Builder (beta)

Dec 2025

Transform your natural language requests into structured OpenRouter API request…

128K · in - · out -

mistral-large-latest

Dec 2025

Official mistral-large-2512 Mistral AI model

262K · in $0.5 · out $1.5

ministral-8b-latest

Dec 2025

Ministral 3 (a.k.a. Tinystral) 8B Instruct.

262K · in $0.15 · out $0.15

ministral-3b-latest

Dec 2025

Ministral 3 (a.k.a. Tinystral) 3B Instruct.

131K · in $0.1 · out $0.1

ministral-14b-latest

Dec 2025

Ministral 3 (a.k.a. Tinystral) 14B Instruct.

262K · in $0.2 · out $0.2

Ministral 8b (2512)

Dec 2025

Ministral 3 (a.k.a. Tinystral) 8B Instruct.

262K · in $0.15 · out $0.15

Ministral 3b (2512)

Dec 2025

Ministral 3 (a.k.a. Tinystral) 3B Instruct.

131K · in $0.1 · out $0.1

Ministral 3 14B Instruct 2512

Dec 2025

Mistralai chat model. https://huggingface.co/api/models/mistralai/Ministral-3-1…

262K · in $0.2 · out $0.2

Ministral 14b (2512)

Dec 2025

Ministral 3 (a.k.a. Tinystral) 14B Instruct.

262K · in $0.2 · out $0.2

Amazon: Nova 2 Lite

Dec 2025

Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads th…

1M · in $0.3 · out $2.5

Mistral Large (2512)

Dec 2025

Official mistral-large-2512 Mistral AI model

262K · in $0.5 · out $1.5

DeepSeek: DeepSeek V3.2

Dec 2025

DeepSeek-V3.2 is a large language model designed to harmonize high computationa…

164K · in $0.27 · out $0.4

Arcee AI: Trinity Mini

Dec 2025

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language…

131K · in $0.05 · out $0.15

Claude Opus 4.5 · US

Nov 2025

Previous most intelligent model with advanced reasoning for complex agentic wor…

200K · in $5 · out $25

Claude Opus 4.5

Nov 2025

Previous most intelligent model with advanced reasoning for complex agentic wor…

200K · in $5 · out $25

AllenAI: Olmo 3 32B Think

Nov 2025

Olmo 3 32B Think is a large-scale, 32-billion-parameter model purpose-built for…

66K · in $0.15 · out $0.5

Nano Banana Pro Preview

Nov 2025

Gemini 3 Pro Image Preview

164K · in $2 · out $12

Nano Banana Pro

Nov 2025

Gemini 3 Pro Image Preview

164K · in $2 · out $12

GPT-5.1 Codex Max

deprecated

Nov 2025

Our most intelligent coding model optimized for long-horizon, agentic coding ta…

400K · in $1.25 · out $10

GPT-5.1 Codex Mini

deprecated

Nov 2025

Smaller, faster version of GPT-5.1 Codex for efficient coding tasks.

400K · in $0.25 · out $2

GPT-5.1 Codex

deprecated

Nov 2025

A version of GPT-5.1 optimized for agentic coding tasks in Codex or similar env…

400K · in $1.25 · out $10

GPT-5.1

HOT

Nov 2025

The best model for coding and agentic tasks with configurable reasoning effort.

400K · in $1.25 · out $10

Deep Cogito: Cogito v2.1 671B

Nov 2025

Cogito v2.1 671B MoE represents one of the strongest open models globally, matc…

128K · in $1.25 · out $1.25

OpenAI: GPT-5.1 Chat

Nov 2025

GPT-5.1 Chat (AKA Instant is the fast, lightweight member of the 5.1 family, op…

128K · in $1.25 · out $10

GPT-5.1 Instant

deprecated

Nov 2025

GPT-5.1 Instant with adaptive reasoning. More conversational with improved inst…

128K · in $1.25 · out $10

Qwen3-VL-235B-A22B-Instruct-FP8

Nov 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-VL-235B-A22B-Inst…

262K · in - · out -

MoonshotAI: Kimi K2 Thinking

Nov 2025

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, e…

262K · in $0.6 · out $2.5

Amazon: Nova Premier 1.0

Oct 2025

Amazon Nova Premier is the most capable of Amazon’s multimodal models for compl…

1M · in $2.5 · out $12.5

Perplexity: Sonar Pro Search

Oct 2025

Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is…

200K · in $3 · out $15

Mistral: Voxtral Small 24B 2507

Oct 2025

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-…

32K · in $0.1 · out $0.3

OpenAI: gpt-oss-safeguard-20b

Oct 2025

gpt-oss-safeguard-20b is a safety reasoning model from OpenAI built upon gpt-os…

131K · in $0.08 · out $0.3

NVIDIA: Nemotron Nano 12B 2 VL (free) ·

Oct 2025

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning m…

128K · in - · out -

Medgemma 27B Text It

Oct 2025

Google chat model. https://huggingface.co/api/models/google/medgemma-27b-text-it

131K · in - · out -

Qwen: Qwen3 VL 32B Instruct

Oct 2025

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designe…

262K · in $0.1 · out $0.42

MiniMax: MiniMax M2

Oct 2025

MiniMax-M2 is a compact, high-efficiency large language model optimized for end…

205K · in $0.3 · out $1.2

IBM: Granite 4.0 Micro

Oct 2025

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. Thes…

131K · in $0.02 · out $0.11

Microsoft: Phi 4 Mini Instruct

Oct 2025

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and f…

131K · in $0.08 · out $0.35

Claude Haiku 4.5 · Global

Oct 2025

Fastest model with exceptional speed and performance (Bedrock Inference Profile)

200K · in $1 · out $5

Claude Haiku 4.5

Oct 2025

Fastest model with exceptional speed and performance

200K · in $1 · out $5

Qwen: Qwen3 VL 8B Thinking

Oct 2025

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B mult…

256K · in $0.12 · out $1.37

Qwen: Qwen3 VL 8B Instruct

Oct 2025

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL se…

256K · in $0.12 · out $0.46

GPT-5 Search API

Oct 2025

Updated web search model in Chat Completions API. 60% cheaper with domain filte…

400K · in $1.25 · out $10

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Oct 2025

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning…

131K · in $0.4 · out $0.4

Gemini 2.5 Computer Use Preview 10-2025

Oct 2025

Gemini 2.5 Computer Use Preview 10-2025

197K · in $1.25 · out $10

Qwen: Qwen3 VL 30B A3B Thinking

Oct 2025

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text genera…

131K · in $0.13 · out $1.56

Qwen: Qwen3 VL 30B A3B Instruct

Oct 2025

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text genera…

262K · in $0.13 · out $0.52

GPT-5 Pro

Oct 2025

Version of GPT-5 that uses more compute to produce smarter and more precise res…

400K · in $15 · out $120

GPT Audio Mini

Oct 2025

Cost-efficient audio model. Accepts audio inputs and outputs via Chat Completio…

128K · in $0.6 · out $2.4

Nano Banana

Oct 2025

Gemini 2.5 Flash Preview Image

66K · in $0.3 · out $2.5

GLM-4.6

Sep 2025

GLM-4.6 model with 128K context/output. Hybrid thinking: auto-determines whethe…

131K · in $0.6 · out $2.2

Gemma 3 270M It

Sep 2025

Google chat model. https://huggingface.co/api/models/google/gemma-3-270m-it

33K · in - · out -

DeepSeek: DeepSeek V3.2 Exp

Sep 2025

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek…

164K · in $0.27 · out $0.41

Claude Sonnet 4.5 · US

Sep 2025

Previous best combination of speed and intelligence for complex agents and codi…

200K · in $3 · out $15

Claude Sonnet 4.5

HOT

Sep 2025

Previous best combination of speed and intelligence for complex agents and codi…

200K · in $3 · out $15

TheDrummer: Cydonia 24B V4.1

Sep 2025

Uncensored and creative writing model based on Mistral Small 3.2 24B with good…

131K · in $0.3 · out $0.5

Relace: Relace Apply 3

Sep 2025

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edit…

256K · in $0.85 · out $1.25

Google: Gemini 2.5 Flash Lite Preview 09-2025

Sep 2025

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family…

1M · in $0.1 · out $0.4

Qwen3 Next 80B A3b Instruct Fp8

Sep 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-Next-80B-A3B-Inst…

- · in - · out -

Qwen3 Vl 235b A22b Thinking

Sep 2025

Alibaba model (not yet curated).

131K · in $0.26 · out $2.6

Qwen3 Vl 235b A22b Instruct

Sep 2025

Alibaba model (not yet curated).

131K · in $0.21 · out $1.9

Qwen3 Max

Sep 2025

Alibaba model (not yet curated).

131K · in $0.78 · out $3.9

Qwen3 Coder Plus

Sep 2025

Agentic coding model with very long context. Tiered pricing by input length (up…

1M · in $1 · out $5

DeepSeek: DeepSeek V3.1 Terminus

Sep 2025

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's…

131K · in $0.27 · out $1

Qwen3 Coder Flash

Sep 2025

Alibaba model (not yet curated).

131K · in $0.2 · out $0.98

Nvidia Nemotron Nano 9B V2

Sep 2025

Nvidia chat model. https://huggingface.co/api/models/nvidia/NVIDIA-Nemotron-Nan…

131K · in $0.06 · out $0.25

magistral-small-latest

Sep 2025

Mistral Small 4.

262K · in $0.5 · out $1.5

magistral-medium-latest

Sep 2025

Our frontier-class reasoning model release candidate September 2025.

131K · in $2 · out $5

Magistral Small (2509)

Sep 2025

Our efficient reasoning model released September 2025.

131K · in $0.5 · out $1.5

Magistral Medium (2509)

Sep 2025

Our frontier-class reasoning model release candidate September 2025.

131K · in $2 · out $5

GPT-5 Codex

deprecated

Sep 2025

A version of GPT-5 optimized for agentic coding in Codex.

400K · in $1.25 · out $10

Qwen3 Next 80b A3b Thinking

Sep 2025

Alibaba model (not yet curated).

131K · in $0.1 · out $0.78

Qwen3 Next 80b A3b Instruct

Sep 2025

Alibaba model (not yet curated).

131K · in $0.1 · out $0.78

NVIDIA: Nemotron Nano 9B V2 (free) ·

Sep 2025

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch…

128K · in - · out -

MoonshotAI: Kimi K2 0905

Sep 2025

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixtu…

262K · in $0.6 · out $2.5

[Groq] Compound Mini (Agentic System)

Sep 2025

Lighter Groq agentic AI with web search, code execution. Pricing based on under…

131K · in - · out -

[Groq] Compound (Agentic System)

Sep 2025

Groq agentic AI with web search, code execution, browser automation. Uses GPT-O…

131K · in - · out -

Qwen3 30b A3b Thinking 2507

Aug 2025

Alibaba model (not yet curated).

131K · in $0.13 · out $1.56

GPT Audio

Aug 2025

First generally available audio model. Accepts audio inputs and outputs, and ca…

128K · in $2.5 · out $10

Nous: Hermes 4 70B

Aug 2025

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llam…

131K · in $0.13 · out $0.4

Nous: Hermes 4 405B

Aug 2025

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and rele…

131K · in $1 · out $3

DeepSeek: DeepSeek V3.1

Aug 2025

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) t…

164K · in $0.25 · out $0.95

Mistral: Mistral Medium 3.1

Aug 2025

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-p…

131K · in $0.4 · out $2

Mistral Medium (2508)

Aug 2025

Update on Mistral Medium 3 with improved capabilities.

131K · in $0.4 · out $2

GLM-4.5 V

Aug 2025

Vision-enabled GLM-4.5 model. 96K context, 16K output, interleaved thinking.

98K · in $0.6 · out $1.8

Qwen3 4B Instruct 2507

Aug 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-4B-Instruct-2507

262K · in - · out -

AI21: Jamba Large 1.7

Aug 2025

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvem…

256K · in $2 · out $8

OpenAI: GPT-5 Chat

Aug 2025

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware con…

128K · in $1.25 · out $10

GPT-5 Nano

Aug 2025

Fastest, most cost-efficient version of GPT-5 for summarization and classificat…

400K · in $0.05 · out $0.4

GPT-5 Mini

Aug 2025

A faster, more cost-efficient version of GPT-5 for well-defined tasks.

400K · in $0.25 · out $2

GPT-5 ChatGPT

deprecated

Aug 2025

GPT-5 model used in ChatGPT.

128K · in $1.25 · out $10

GPT-5

Aug 2025

The best model for coding and agentic tasks across domains.

400K · in $1.25 · out $10

OpenAI: gpt-oss-20b

Aug 2025

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the…

131K · in $0.03 · out $0.13

OpenAI: gpt-oss-120b

Aug 2025

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) languag…

131K · in $0.04 · out $0.17

Claude Opus 4.1· US

deprecated

Aug 2025

Previous Opus model. Deprecated June 5, 2026, retiring August 5, 2026. (Bedrock…

200K · in $15 · out $75

Claude Opus 4.1

deprecated

Aug 2025

Previous Opus model. Deprecated June 5, 2026, retiring August 5, 2026.

200K · in $15 · out $75

Command A Translate

Aug 2025

Specialized machine translation across 23 languages, with tool use and JSON out…

9K · in - · out -

Command A Reasoning

Aug 2025

Reasoning-tuned Command A for multi-step agents and hard problem solving across…

289K · in $2.5 · out $10

Qwen: Qwen3 Coder 30B A3B Instruct

Jul 2025

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) mode…

160K · in $0.07 · out $0.27

Glm 4.5 Air Fp8

Jul 2025

Zai Org chat model. https://huggingface.co/api/models/zai-org/GLM-4.5-Air-FP8

131K · in $0.2 · out $1.1

codestral-latest

Jul 2025

Our cutting-edge language model for coding released August 2025.

256K · in $0.3 · out $0.9

Codestral (2508)

Jul 2025

Our cutting-edge language model for coding released August 2025.

256K · in $0.3 · out $0.9

Qwen3 30b A3b Instruct 2507

Jul 2025

Alibaba model (not yet curated).

131K · in $0.1 · out $0.3

Qwen3 235B A22b Instruct 2507 Fp8

Jul 2025

Together AI chat model. https://huggingface.co/api/models/Qwen/Qwen3-235B-A22B-…

262K · in - · out -

GLM-4.5 X

Jul 2025

Extended GLM-4.5 model. Interleaved thinking.

98K · in $2.2 · out $8.9

GLM-4.5 Flash (Free)

Jul 2025

Free GLM-4.5 variant with limited concurrency. Prior-gen, superseded by GLM-4.7…

98K · in - · out -

GLM-4.5 AirX

Jul 2025

Extended lightweight GLM-4.5 variant. Interleaved thinking.

98K · in $1.1 · out $4.5

Qwen3 235b A22b Thinking 2507

Jul 2025

Alibaba model (not yet curated).

131K · in $0.3 · out $3

GLM-4.5 Air

Jul 2025

Lightweight GLM-4.5 variant. Interleaved thinking.

98K · in $0.2 · out $1.1

GLM-4.5

Jul 2025

Prior-gen GLM-4.5 model with 96K context/output. Interleaved thinking.

98K · in $0.6 · out $2.2

Qwen3 Coder 480B A35B Instruct Fp8

Jul 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-Coder-480B-A35B-I…

262K · in $2 · out $2

Qwen: Qwen3 Coder 480B A35B (free) ·

Jul 2025

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation mo…

1M · in - · out -

Qwen3 235B A22B Instruct 2507 FP8 Throughput

Jul 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-235B-A22B-Instruc…

262K · in $0.2 · out $0.6

Gemini 2.5 Flash-Lite

Jul 2025

Stable version of Gemini 2.5 Flash-Lite, released in July of 2025

1.1M · in $0.1 · out $0.4

ByteDance: UI-TARS 7B

Jul 2025

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based envir…

128K · in $0.1 · out $0.2

Qwen: Qwen3 235B A22B Instruct 2507

Jul 2025

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-e…

262K · in $0.09 · out $0.55

voxtral-small-latest

Jul 2025

A small audio understanding model released in July 2025

33K · in $0.1 · out $0.3

voxtral-mini-latest

Jul 2025

A mini audio understanding model released in July 2025

33K · in $0.04 · out $0.04

Voxtral Small (2507)

Jul 2025

A small audio understanding model released in July 2025

33K · in $0.1 · out $0.3

Voxtral Mini (2507)

Jul 2025

A mini audio understanding model released in July 2025

33K · in $0.04 · out $0.04

Switchpoint Router

Jul 2025

Switchpoint AI's router instantly analyzes your request and directs it to the o…

131K · in $0.85 · out $3.4

MoonshotAI: Kimi K2 0711

Jul 2025

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model devel…

131K · in $0.57 · out $2.3

Sarvam M

Jul 2025

Sarvamai chat model. https://huggingface.co/api/models/sarvamai/sarvam-m

33K · in - · out -

Meta Llama 3.1 8B Instruct Awq Int4

Jul 2025

Meta chat model. https://huggingface.co/api/models/togethercomputer/meta-llama-…

131K · in - · out -

Venice: Uncensored (free) ·

Jul 2025

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of…

33K · in - · out -

Tencent: Hunyuan A13B Instruct

Jul 2025

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model…

131K · in $0.14 · out $0.57

Morph: Morph V3 Large

Jul 2025

Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec wit…

262K · in $0.9 · out $1.9

Morph: Morph V3 Fast

Jul 2025

Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accurac…

82K · in $0.8 · out $1.2

Command A Vision

Jul 2025

Multimodal Command A for charts, graphs, diagrams, OCR, and document understand…

128K · in - · out -

Baidu: ERNIE 4.5 VL 424B A47B

Jun 2025

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baid…

131K · in $0.42 · out $1.25

Minimax M1 80K

Jun 2025

MiniMaxAI chat model. https://huggingface.co/api/models/togethercomputer/MiniMa…

1M · in - · out -

o4 Mini Deep Research

deprecated

Jun 2025

Faster, more affordable deep research model for complex, multi-step research ta…

200K · in $2 · out $8

o3 Deep Research

deprecated

Jun 2025

Our most powerful deep research model for complex, multi-step research tasks.

200K · in $10 · out $40

Minimax M1 40K

Jun 2025

MiniMaxAI chat model. https://huggingface.co/api/models/togethercomputer/MiniMa…

1M · in - · out -

Mistral: Mistral Small 3.2 24B

Jun 2025

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mist…

131K · in $0.1 · out $0.3

Mistral Small (2506)

Jun 2025

Our latest enterprise-grade small model with the latest version released June 2…

131K · in $0.1 · out $0.3

MiniMax: MiniMax M1

Jun 2025

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended…

1M · in $0.55 · out $2.2

Gemini 2.5 Pro

Jun 2025

Stable release (June 17th, 2025) of Gemini 2.5 Pro

1.1M · in $1.25 · out $10

Gemini 2.5 Flash

Jun 2025

Stable version of Gemini 2.5 Flash, our mid-size multimodal model that supports…

1.1M · in $0.3 · out $2.5

Magistral Small 2506

Jun 2025

Mistralai chat model. https://huggingface.co/api/models/mistralai/Magistral-Sma…

41K · in - · out -

Llama 4 Scout (17Bx16E)

Jun 2025

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-4-Scout-17B…

262K · in - · out -

o3 Pro

Jun 2025

Version of o3 with more compute for better responses. Provides consistently bet…

200K · in $20 · out $80

Gemma 2B It

Jun 2025

Google chat model. https://huggingface.co/api/models/google/gemma-2b-it

8K · in - · out -

Gemma 2 9B It

Jun 2025

Google chat model. https://huggingface.co/api/models/google/gemma-2-9b-it

8K · in - · out -

Qwen3 1.7B

Jun 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-1.7B

41K · in - · out -

Qwen3 0.6B

Jun 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-0.6B

41K · in - · out -

Molmo 7B D 0924

May 2025

Allenai chat model. https://huggingface.co/api/models/allenai/Molmo-7B-D-0924

4K · in - · out -

DeepSeek: R1 0528

May 2025

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1,…

164K · in $0.5 · out $2.15

Mixtral 8X22b Instruct V0.1

May 2025

Mistralai chat model. https://huggingface.co/api/models/mistralai/Mixtral-8x22B…

66K · in - · out -

Claude Sonnet 4 [Retired] · US

May 2025

High-performance model. Retired June 15, 2026 (except on Bedrock and Vertex AI)…

200K · in $3 · out $15

Anthropic: Claude Sonnet 4

May 2025

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Son…

1M · in $3 · out $15

Anthropic: Claude Opus 4

May 2025

Claude Opus 4 is benchmarked as the world’s best coding model, at time of relea…

200K · in $15 · out $75

Devstral Small 2505

May 2025

Mistralai chat model. https://huggingface.co/api/models/togethercomputer/Devstr…

131K · in - · out -

Mistral 7B v0.1

May 2025

Mistralai chat model. https://huggingface.co/api/models/mistralai/Mistral-7B-v0…

33K · in - · out -

Google: Gemma 3n 4B

May 2025

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource…

33K · in $0.06 · out $0.12

Google: Gemini 2.5 Pro Preview 06-05

May 2025

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reas…

1M · in $1.25 · out $10

Gemini 2.5 Pro Preview TTS

May 2025

Gemini 2.5 Pro Preview TTS

25K · in $1 · out -

Gemini 2.5 Flash Preview TTS

May 2025

Gemini 2.5 Flash Preview TTS

25K · in $0.5 · out -

Deepcoder 14B Preview

May 2025

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

131K · in - · out -

mistral-medium-3

May 2025

Official mistral-medium-latest Mistral AI model

262K · in $0.4 · out $2

Mistral Medium (2505)

May 2025

Our frontier-class multimodal model released May 2025.

131K · in $0.4 · out $2

Google: Gemini 2.5 Pro Preview 05-06

May 2025

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reas…

1M · in $1.25 · out $10

Arcee AI: Virtuoso Large

May 2025

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tune…

131K · in $0.75 · out $1.2

Arcee AI: Coder Large

May 2025

Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been fu…

33K · in $0.5 · out $0.8

Meta: Llama Guard 4 12B

Apr 2025

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tune…

164K · in $0.18 · out $0.18

Qwen3 8b

Apr 2025

Alibaba model (not yet curated).

131K · in $0.12 · out $0.46

Qwen3 32b

Apr 2025

Alibaba model (not yet curated).

131K · in $0.08 · out $0.28

Qwen3 30b A3b

Apr 2025

Alibaba model (not yet curated).

131K · in $0.13 · out $0.52

Qwen3 235b A22b

Apr 2025

Alibaba model (not yet curated).

131K · in $0.46 · out $1.82

Qwen3 14b

Apr 2025

Alibaba model (not yet curated).

131K · in $0.12 · out $0.24

Arize AI Qwen 2 1.5B Instruct

Apr 2025

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

33K · in $0.1 · out $0.1

o4 Mini

deprecated

Apr 2025

Latest o4-mini model. Optimized for fast, effective reasoning with exceptionall…

200K · in $1.1 · out $4.4

Apr 2025

A well-rounded and powerful model across domains. Sets a new standard for math,…

200K · in $2 · out $8

Llama 3.1 405B

Apr 2025

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-3.1-405B

131K · in - · out -

Qwen2.5 7B Instruct

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct

33K · in - · out -

Qwen2.5 7B

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-7B

131K · in - · out -

Qwen2.5 72B

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-72B

131K · in - · out -

Qwen2.5 3B Instruct

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-3B-Instruct

33K · in - · out -

Qwen2.5 32B Instruct

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-32B-Instruct

33K · in - · out -

Qwen2.5 32B

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-32B

131K · in - · out -

Qwen2.5 14B

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-14B

131K · in - · out -

Qwen2.5 1.5B Instruct

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-1.5B-Instruct

33K · in - · out -

Qwen2.5 1.5B

Apr 2025

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-1.5B

131K · in - · out -

Llama 3.2 1B

Apr 2025

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-3.2-1B

131K · in - · out -

Llama 3.1 70B

Apr 2025

Meta chat model. https://huggingface.co/api/models/meta-llama/Llama-3.1-70B

131K · in - · out -

GPT-4.1 Nano

Apr 2025

Fastest, most cost-effective GPT 4.1 model. Delivers exceptional performance wi…

1M · in $0.1 · out $0.4

GPT-4.1 Mini

Apr 2025

Balanced for intelligence, speed, and cost. Matches or exceeds GPT-4o in intell…

1M · in $0.4 · out $1.6

GPT-4.1

Apr 2025

Flagship GPT model for complex tasks. Major improvements on coding, instruction…

1M · in $2 · out $8

GLM-4 32B (0414) 128K

Apr 2025

GLM-4 32B model with 128K context, 16K output.

131K · in $0.1 · out $0.1

Qwen2 72B Instruct

Apr 2025

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

33K · in $0.9 · out $0.9

Cogito V1 Preview Qwen 32B

Apr 2025

deepcogito chat model.

131K · in - · out -

Cogito V1 Preview Qwen 14B

Apr 2025

deepcogito chat model.

131K · in - · out -

Cogito V1 Preview Llama 8B

Apr 2025

deepcogito chat model.

131K · in - · out -

Cogito V1 Preview Llama 70B Turbo

Apr 2025

deepcogito chat model.

131K · in - · out -

Cogito V1 Preview Llama 70B

Apr 2025

deepcogito chat model.

131K · in - · out -

Meta: Llama 4 Scout

Apr 2025

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model d…

10M · in $0.1 · out $0.3

Meta: Llama 4 Maverick

Apr 2025

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language mod…

1M · in $0.2 · out $0.8

[Meta] Llama 4 Scout · 17B × 16E (Preview)

Apr 2025

Llama 4 Scout 17B MoE with 16 experts (109B total params), native multimodal wi…

131K · in $0.11 · out $0.34

Gemma 3 1b it

Apr 2025

Google chat model.

33K · in - · out -

DeepSeek R1 Distill Qwen 7B

Apr 2025

Deepseek chat model. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwe…

131K · in - · out -

meta-llama/Llama-2-7b-chat-hf

Apr 2025

Meta chat model. https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

4K · in - · out -

DeepSeek: DeepSeek V3 0324

Mar 2025

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteratio…

164K · in $0.27 · out $1.12

o1 Pro

Mar 2025

A version of o1 with more compute for better responses. Provides consistently b…

200K · in $150 · out $600

nim/nvidia/llama-3.3-nemotron-super-49b-v1

Mar 2025

Nvidia chat model.

16K · in - · out -

Mistral: Mistral Small 3.1 24B

Mar 2025

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501)…

128K · in $0.35 · out $0.56

nim/nv-mistralai/mistral-nemo-12b-instruct

Mar 2025

NVIDIA chat model.

16K · in - · out -

nim/mistralai/mixtral-8x7b-instruct-v01

Mar 2025

mistralai chat model.

16K · in - · out -

nim/meta/llama-3.1-70b-instruct

Mar 2025

Llama chat model.

16K · in - · out -

nim/meta/llama-3.1-8b-instruct

Mar 2025

Meta chat model.

16K · in - · out -

Google: Gemma 3 4B

Mar 2025

Gemma 3 introduces multimodality, supporting vision-language input and text out…

131K · in $0.05 · out $0.1

Google: Gemma 3 12B

Mar 2025

Gemma 3 introduces multimodality, supporting vision-language input and text out…

131K · in $0.05 · out $0.15

Command A

Mar 2025

Cohere's efficient 111B enterprise model for agents, tool use, and multilingual…

288K · in $2.5 · out $10

Cohere: Command A

Mar 2025

Command A is an open-weights 111B parameter model with a 256k context window fo…

256K · in $2.5 · out $10

Reka Flash 3

Mar 2025

Reka Flash 3 is a general-purpose, instruction-tuned large language model with…

66K · in $0.1 · out $0.2

nim/nvidia/llama-3.1-nemotron-70b-instruct

Mar 2025

NVIDIA chat model.

16K · in - · out -

nim/meta/llama-3.3-70b-instruct

Mar 2025

Meta chat model.

16K · in - · out -

Google: Gemma 3 27B

Mar 2025

Gemma 3 introduces multimodality, supporting vision-language input and text out…

131K · in $0.1 · out $0.3

GPT-4o Search Preview

deprecated

Mar 2025

Latest snapshot of the GPT-4o model optimized for web search capabilities.

128K · in $2.5 · out $10

GPT-4o Mini Search Preview

deprecated

Mar 2025

Latest snapshot of the GPT-4o Mini model optimized for web search capabilities.

128K · in $0.15 · out $0.6

TheDrummer: Skyfall 36B V2

Mar 2025

Skyfall 36B v2 is an enhanced iteration of Mistral Small 2501, specifically fin…

33K · in $0.55 · out $0.8

nim/mistralai/mixtral-8x22b-instruct-v01

Mar 2025

Mistral chat model.

16K · in - · out -

nim/meta/llama-3.2-90b-vision-instruct

Mar 2025

Meta chat model.

16K · in - · out -

nim/meta/llama-3.2-11b-vision-instruct

Mar 2025

Nvidia chat model.

16K · in - · out -

Meta Llama 3.1 8B Instruct Turbo

Mar 2025

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

131K · in $0.18 · out $0.18

Qwen QwQ-32B

Mar 2025

Qwen chat model. https://huggingface.co/Qwen/QwQ-32B

131K · in $1.2 · out $1.2

Sonar Reasoning Pro

Feb 2025

Premier reasoning model (DeepSeek R1) with Chain of Thought. 128k context.

128K · in $2 · out $8

Mistral: Saba

Feb 2025

Mistral Saba is a 24B-parameter language model specifically designed for the Mi…

33K · in $0.2 · out $0.6

Sonar Deep Research

Feb 2025

Expert-level research model for exhaustive searches and comprehensive reports.…

128K · in $2 · out $8

Gemini 2.0 Flash 001

Feb 2025

Stable version of Gemini 2.0 Flash, our fast and versatile multimodal model for…

1.1M · in $0.1 · out $0.4

AionLabs: Aion-RP 1.0 (8B)

Feb 2025

Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of t…

33K · in $0.8 · out $1.6

AionLabs: Aion-1.0-Mini

Feb 2025

Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 mod…

131K · in $0.7 · out $1.4

AionLabs: Aion-1.0

Feb 2025

Aion-1.0 is a multi-model system designed for high performance across various t…

131K · in $4 · out $8

Qwen: Qwen2.5 VL 72B Instruct

Feb 2025

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds,…

131K · in $0.8 · out $1

Qwen Plus

Feb 2025

Balanced quality, speed, and cost with hybrid thinking. 1M context.

1M · in $0.4 · out $1.2

Command R7B Arabic

Feb 2025

Command R7B tuned for Modern Standard Arabic and English enterprise use cases.…

128K · in $0.04 · out $0.15

o3 Mini

Jan 2025

Latest o3-mini model snapshot. High intelligence at the same cost and latency t…

200K · in $1.1 · out $4.4

Mistral: Mistral Small 3

Jan 2025

Mistral Small 3 is a 24B-parameter language model optimized for low-latency per…

33K · in $0.05 · out $0.08

DeepSeek R1 Distill Qwen 14B

Jan 2025

DeepSeek chat model. https://huggingface.co/api/models/deepseek-ai/DeepSeek-R1-…

131K · in $1.6 · out $1.6

DeepSeek R1 Distill Qwen 1.5B

Jan 2025

DeepSeek chat model. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwe…

131K · in $0.18 · out $0.18

DeepSeek: R1 Distill Llama 70B

Jan 2025

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llam…

128K · in $0.8 · out $0.8

Sonar Pro

Jan 2025

Advanced search model for complex queries and deep content understanding. 200k…

200K · in $3 · out $15

Sonar

Jan 2025

Lightweight, cost-effective search model for quick, grounded answers. 128k cont…

128K · in $1 · out $1

DeepSeek: R1

Jan 2025

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and wi…

164K · in $0.7 · out $2.5

V1 8K Vision (Preview)

Jan 2025

Legacy vision model with 8K context. Preview variant - use moonshot-v1-vision f…

8K · in $0.2 · out $2

V1 32K Vision (Preview)

Jan 2025

Legacy vision model with 32K context. Preview variant - use moonshot-v1-vision…

33K · in $1 · out $3

V1 128K Vision (Preview)

Jan 2025

Legacy vision model with 128K context. Preview variant - use moonshot-v1-vision…

131K · in $2 · out $5

MiniMax: MiniMax-01

Jan 2025

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01…

1M · in $0.2 · out $1.1

Microsoft: Phi 4

Jan 2025

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks…

16K · in $0.07 · out $0.14

Qwen2-VL (72B) Instruct

Jan 2025

Qwen chat model. https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct

33K · in $1.2 · out $1.2

Sao10K: Llama 3.1 70B Hanami x1

Jan 2025

This is Sao10K's experiment over Euryale v2.2.

16K · in $3 · out $3

DeepSeek: DeepSeek V3

Dec 2024

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instr…

131K · in $0.2 · out $0.8

Sao10K: Llama 3.3 Euryale 70B

Dec 2024

Euryale L3.3 70B is a model focused on creative roleplay from Sao10k. It is the…

131K · in $0.65 · out $0.75

deprecated

Dec 2024

Previous full o-series reasoning model.

200K · in $15 · out $60

Cohere: Command R7B (12-2024)

Dec 2024

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivere…

128K · in $0.04 · out $0.15

Qwen 2.5 14B Instruct

Dec 2024

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen2.5-14B-Instruct

33K · in $0.8 · out $0.8

Qwen2.5 72B Instruct

Dec 2024

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

33K · in $1.2 · out $1.2

Meta: Llama 3.3 70B Instruct (free) ·

Dec 2024

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and…

131K · in - · out -

Meta Llama 3.3 70B Instruct Turbo

Dec 2024

Meta chat model. https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

131K · in $1.04 · out $1.04

Meta Llama 3.1 405B Instruct

Dec 2024

Meta chat model. https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct

4K · in $3.5 · out $3.5

[Meta] Llama 3.3 · 70B Versatile

Dec 2024

Meta Llama 3.3 (70B params) with GQA. Strong reasoning, coding, multilingual. 1…

131K · in $0.59 · out $0.79

Amazon: Nova Pro 1.0

Dec 2024

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on provid…

300K · in $0.8 · out $3.2

Amazon: Nova Micro 1.0

Dec 2024

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency res…

128K · in $0.04 · out $0.14

Amazon: Nova Lite 1.0

Dec 2024

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focus…

300K · in $0.06 · out $0.24

Mistral Large 2407

Nov 2024

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-240…

131K · in $2 · out $6

Qwen 2.5 Coder 32B Instruct

Nov 2024

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

16K · in $0.8 · out $0.8

Qwen2.5 Coder 32B Instruct

Nov 2024

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models…

128K · in $0.66 · out $1

Llama 3.1 Nemotron 70B Instruct HF

Nov 2024

nvidia chat model. https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruc…

33K · in $0.88 · out $0.88

TheDrummer: UnslopNemo 12B

Nov 2024

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed…

33K · in $0.4 · out $0.4

Magnum v4 72B

Oct 2024

This is a series of models designed to replicate the prose quality of the Claud…

33K · in $3 · out $5

Qwen: Qwen2.5 7B Instruct

Oct 2024

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings t…

131K · in $0.04 · out $0.1

Qwen2.5 7B Instruct Turbo

Oct 2024

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

33K · in $0.3 · out $0.3

Qwen2.5 72B Instruct Turbo

Oct 2024

Qwen chat model. https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

131K · in $1.2 · out $1.2

Inflection: Inflection 3 Productivity

Oct 2024

Inflection 3 Productivity is optimized for following instructions. It is better…

8K · in $2.5 · out $10

Inflection: Inflection 3 Pi

Oct 2024

Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional…

8K · in $2.5 · out $10

Aya Expanse 32B

Oct 2024

Open-weights multilingual research model covering 23 languages. 128K context. T…

128K · in $0.5 · out $1.5

TheDrummer: Rocinante 12B

Sep 2024

Rocinante 12B is designed for engaging storytelling and rich prose. Early teste…

66K · in $0.25 · out $0.5

Meta: Llama 3.2 3B Instruct (free) ·

Sep 2024

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimi…

131K · in - · out -

Meta: Llama 3.2 1B Instruct

Sep 2024

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently per…

131K · in $0.03 · out $0.2

Meta: Llama 3.2 11B Vision Instruct

Sep 2024

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed…

131K · in $0.35 · out $0.35

Qwen2.5 72B Instruct

Sep 2024

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings…

131K · in $0.36 · out $0.4

Cohere: Command R+ (08-2024)

Aug 2024

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher t…

128K · in $2.5 · out $10

Cohere: Command R (08-2024)

Aug 2024

command-r-08-2024 is an update of the Command R with improved performance for m…

128K · in $0.15 · out $0.6

Sao10K: Llama 3.1 Euryale 70B v2.2

Aug 2024

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It i…

131K · in $0.85 · out $0.85

Nous: Hermes 3 70B Instruct

Aug 2024

Hermes 3 is a generalist language model with many improvements over Hermes 2, i…

131K · in $0.7 · out $0.7

Nous: Hermes 3 405B Instruct (free) ·

Aug 2024

Hermes 3 is a generalist language model with many improvements over Hermes 2, i…

131K · in - · out -

Sao10K: Llama 3 8B Lunaris

Aug 2024

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It…

8K · in $0.04 · out $0.05

Meta: Llama 3.1 8B Instruct

Jul 2024

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & fla…

131K · in $0.05 · out $0.08

Meta: Llama 3.1 70B Instruct

Jul 2024

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & fla…

131K · in $0.4 · out $0.4

[Meta] Llama 3.1 · 8B Instant

Jul 2024

Meta Llama 3.1 (8B params). Fast, cost-effective for high-volume tasks. 131K co…

131K · in $0.05 · out $0.08

Meta Llama 3.1 70B Instruct Turbo

Jul 2024

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

131K · in $0.88 · out $0.88

Mistral: Mistral Nemo

Jul 2024

A 12B parameter model with a 128k token context length built by Mistral in coll…

131K · in $0.02 · out $0.03

open-mistral-nemo-2407

Jul 2024

Our best multilingual open source model released July 2024.

131K · in $0.15 · out $0.15

open-mistral-nemo

Jul 2024

Our best multilingual open source model released July 2024.

131K · in $0.15 · out $0.15

GPT-4o mini

Jul 2024

Affordable model for fast, lightweight tasks. GPT-4o Mini is cheaper and more c…

128K · in $0.15 · out $0.6

Google: Gemma 2 27B

Jul 2024

Gemma 2 27B by Google is an open model built from the same research and technol…

8K · in $0.65 · out $0.65

Qwen 2 Instruct (1.5B)

Jun 2024

Qwen chat model. https://huggingface.co/Qwen/Qwen2-72B-Instruct

33K · in $0.02 · out $0.02

Mistral (7B) Instruct v0.3

May 2024

mistralai chat model. https://huggingface.co/api/models/mistralai/Mistral-7B-In…

33K · in $0.2 · out $0.2

GPT-4o

May 2024

Snapshot of gpt-4o from November 20th, 2024.

128K · in $2.5 · out $10

Meta: Llama 3 8B Instruct

Apr 2024

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavo…

8K · in $0.14 · out $0.14

Meta Llama 3 8B Instruct Reference

Apr 2024

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

8K · in $0.2 · out $0.2

Meta Llama 3 8B Instruct

Apr 2024

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

8K · in $0.2 · out $0.2

Mistral: Mixtral 8x22B Instruct

Apr 2024

Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B ac…

66K · in $2 · out $6

WizardLM-2 8x22B

Apr 2024

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates…

66K · in $0.62 · out $0.62

GPT-4 Turbo

Apr 2024

GPT-4 Turbo with Vision model. Vision requests can now use JSON mode and functi…

128K · in $10 · out $30

Claude Haiku 3 [Retired]

Mar 2024

Fast and compact model for near-instant responsiveness. Retired April 20, 2026.…

200K · in $0.25 · out $1.25

Anthropic: Claude 3 Haiku

Mar 2024

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant r…

200K · in $0.25 · out $1.25

Mistral Large

Feb 2024

This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-24…

128K · in $2 · out $6

Deepseek Coder 33B Instruct

Feb 2024

Deepseek chat model. https://huggingface.co/api/models/deepseek-ai/deepseek-cod…

16K · in $0.8 · out $0.8

V1 8K

Feb 2024

Legacy V1 model with 8K context. Deprecated - use Kimi K2 Instruct instead.

8K · in $0.2 · out $2

V1 32K

Feb 2024

Legacy V1 model with 32K context. Deprecated - use Kimi K2 Instruct instead.

33K · in $1 · out $3

V1 128K

Feb 2024

Legacy V1 model with 128K context. Deprecated - use Kimi K2 Instruct instead.

131K · in $2 · out $5

OpenAI: GPT-4 Turbo Preview

Jan 2024

The preview GPT-4 model with improved instruction following, JSON mode, reprodu…

128K · in $10 · out $30

OpenAI: GPT-3.5 Turbo (older v0613)

Jan 2024

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural…

4K · in $1 · out $2

3.5-Turbo

Jan 2024

The latest GPT-3.5 Turbo model with higher accuracy at responding in requested…

16K · in $0.5 · out $1.5

3.5-Turbo

deprecated

Jan 2024

The latest GPT-3.5 Turbo model with higher accuracy at responding in requested…

16K · in $0.5 · out $1.5

Nous Hermes 2 Mixtral 8X7B Dpo

Jan 2024

Nousresearch chat model. https://huggingface.co/api/models/NousResearch/Nous-He…

33K · in $0.6 · out $0.6

Mixtral-8x7B Instruct v0.1

Dec 2023

mistralai chat model. https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0…

33K · in $0.6 · out $0.6

mistral-medium

Dec 2023

Official mistral-medium-latest Mistral AI model

262K · in $0.4 · out $2

Auto Router

Nov 2023

Your prompt will be processed by a meta-model and routed to one of dozens of mo…

2M · in - · out -

3.5-Turbo

deprecated

Nov 2023

GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducibl…

16K · in $1 · out $2

OpenAI: GPT-3.5 Turbo Instruct

Sep 2023

This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and om…

4K · in $1.5 · out $2

Mistral (7B) Instruct v0.1

Sep 2023

mistralai chat model. https://huggingface.co/api/models/mistralai/Mistral-7B-In…

33K · in $0.2 · out $0.2

OpenAI: GPT-3.5 Turbo 16k

Aug 2023

This model offers four times the context length of gpt-3.5-turbo, allowing it t…

16K · in $3 · out $4

Mancer: Weaver (alpha)

Aug 2023

An attempt to recreate Claude-style verbosity, but don't expect the same level…

8K · in $0.5 · out $0.75

ReMM SLERP 13B

Jul 2023

A recreation trial of the original MythoMax-L2-B13 but with updated models. #me…

6K · in $0.45 · out $0.65

MythoMax 13B

Jul 2023

One of the highest performing and most popular fine-tunes of Llama 2 13B, with…

4K · in $0.06 · out $0.06

GPT-4

deprecated

Jun 2023

Snapshot of gpt-4 from June 13th 2023 with improved function calling support. D…

8K · in $30 · out $60

GPT-4

Jun 2023

Snapshot of gpt-4 from June 13th 2023 with improved function calling support. D…

8K · in $30 · out $60

AI21 Labs Jamba 1.5 Large

AI21 Labs model via Unsupported API (Bedrock Foundation Model)

- · in - · out -

AI21 Labs Jamba 1.5 Mini

AI21 Labs model via Unsupported API (Bedrock Foundation Model)

- · in - · out -

Amazon Nova 2 Lite · Global

Amazon model via Converse API (Bedrock Inference Profile)

- · in - · out -

Amazon Nova Lite

Amazon model via Converse API (Bedrock Foundation Model)

- · in - · out -

Amazon Nova Micro · US

Amazon model via Converse API (Bedrock Inference Profile)

- · in - · out -

Amazon Nova Premier · US

Amazon model via Converse API (Bedrock Inference Profile)

- · in - · out -

Amazon Nova Pro · US

Amazon model via Converse API (Bedrock Inference Profile)

10K · in - · out -

Anthropic Claude 3 Sonnet · US

Anthropic model (Bedrock Inference Profile)

200K · in - · out -

Anthropic Claude Haiku 4 5

Anthropic model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Aya Vision 32B

Open-weights multilingual vision research model (23 languages) with image under…

16K · in $0.5 · out $1.5

Cohere Command R

Cohere model via Unsupported API (Bedrock Foundation Model)

- · in - · out -

Cohere Command R+

Cohere model via Unsupported API (Bedrock Foundation Model)

- · in - · out -

Cohere Embed v4 · Global

Cohere model via Converse API (Bedrock Inference Profile)

- · in - · out -

Deepseek DeepSeek-R1 · US

Deepseek model via Converse API (Bedrock Inference Profile)

- · in - · out -

Gemma 4 12B It

Google chat model. https://huggingface.co/google/gemma-4-12B-it

262K · in - · out -

GLM 4.6

Zai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Google Gemma 3 12B IT

Google model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

Google Gemma 3 27B PT

Google model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

Google Gemma 3 4B IT

Google model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

Google Gemma 4 26b A4b

Google model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Google Gemma 4 31b

Google model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Google Gemma 4 E2b

Google model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

GPT-OSS 120B

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

GPT-OSS 20B

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Kimi K2 Thinking

Moonshotai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Kimi K2.5 Fp4

Togethercomputer chat model. https://huggingface.co/api/models/togethercomputer…

262K · in $0.5 · out $2.8

Labs Leanstral 1 5 1

A mid & post-trained version of mistral small 4 for Lean (260618 SFT)

262K · in - · out -

labs-leanstral-1-5

A mid & post-trained version of mistral small 4 for Lean (260618 SFT)

262K · in - · out -

LFM2-24B-A2B

Togethercomputer chat model.

33K · in $0.03 · out $0.12

LFM2.5-8B-A1B

LiquidAI chat model. https://huggingface.co/api/models/LiquidAI/LFM2.5-8B-A1B

128K · in $0.03 · out $0.12

Meta Llama 3 70B Instruct

Meta model via Converse API (Bedrock Foundation Model)

- · in - · out -

Meta Llama 3 70B Instruct Turbo

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

8K · in $0.88 · out $0.88

Meta Llama 3 8B Instruct

Meta model via Converse API (Bedrock Foundation Model)

- · in - · out -

Meta Llama 3 8B Instruct Lite

Meta chat model. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

8K · in $0.14 · out $0.14

Meta Llama 3.1 70B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 3.1 8B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 3.2 11B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 3.2 1B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 3.2 3B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 3.2 90B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 3.3 70B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 4 Maverick 17B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

Meta Llama 4 Scout 17B Instruct · US

Meta model via Converse API (Bedrock Inference Profile)

- · in - · out -

MiniMax M2

MiniMax model via OpenAI-Compatible API (Bedrock Foundation Model)

410K · in - · out -

MiniMax M2.1

MiniMax model via OpenAI-Compatible API (Bedrock Foundation Model)

197K · in - · out -

MiniMax M2.5

MiniMax model via OpenAI-Compatible API (Bedrock Foundation Model)

197K · in - · out -

Mistral AI Devstral 2 123B

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Mistral AI Magistral Small 2509

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

Mistral AI Ministral 14B 3.0

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Mistral AI Ministral 3 8B

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Mistral AI Ministral 3B

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Mistral AI Mistral 7B Instruct

Mistral AI model via Converse API (Bedrock Foundation Model)

- · in - · out -

Mistral AI Mistral Large (24.02)

Mistral AI model via Converse API (Bedrock Foundation Model)

- · in - · out -

Mistral AI Mistral Large 3

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Mistral AI Mistral Small (24.02)

Mistral AI model via Converse API (Bedrock Foundation Model)

- · in - · out -

Mistral AI Mixtral 8x7B Instruct

Mistral AI model via Converse API (Bedrock Foundation Model)

- · in - · out -

Mistral AI Voxtral Mini 3B 2507

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

33K · in - · out -

Mistral AI Voxtral Small 24B 2507

Mistral AI model via OpenAI-Compatible API (Bedrock Foundation Model)

33K · in - · out -

Mistral Pixtral Large 25.02 · US

Mistral model via Converse API (Bedrock Inference Profile)

- · in - · out -

mistral-code-agent-latest

Official devstral-2512 Mistral AI model

262K · in - · out -

mistral-code-fim-latest

Our cutting-edge language model for coding released August 2025.

256K · in - · out -

mistral-code-latest

Our cutting-edge language model for coding released August 2025.

256K · in - · out -

mistral-tiny-latest

Our best multilingual open source model released July 2024.

131K · in - · out -

mistral-vibe-cli-fast

Mistral Small 4.

262K · in - · out -

mistral-vibe-cli-with-tools

Official mistral-medium-latest Mistral AI model

262K · in - · out -

Moonshot AI Kimi K2 Thinking

Moonshot AI model via Converse API (Bedrock Foundation Model)

262K · in - · out -

Moonshot AI Kimi K2.5

Moonshot AI model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

North Mini Code

Compact agentic coding model from Cohere's North platform. Reasoning and tool u…

436K · in - · out -

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

NVIDIA Nemotron Nano 12B v2 VL BF16

NVIDIA model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

NVIDIA Nemotron Nano 3 30B

NVIDIA model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Open Mistral Nemo

Our best multilingual open source model released July 2024.

131K · in - · out -

Openai Gpt 5.4

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Openai Gpt 5.5 2026 04 23

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Openai Gpt 5.6 Luna

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Openai Gpt 5.6 Sol

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Openai Gpt 5.6 Terra

Openai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

OpenAI GPT OSS Safeguard 120B

OpenAI model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

OpenAI GPT OSS Safeguard 20B

OpenAI model via OpenAI-Compatible API (Bedrock Foundation Model)

131K · in - · out -

OpenAI gpt-oss-120b

OpenAI model via Converse API (Bedrock Foundation Model)

128K · in - · out -

OpenAI gpt-oss-20b

OpenAI model via Converse API (Bedrock Foundation Model)

128K · in - · out -

Qvq Max

Alibaba model (not yet curated).

131K · in - · out -

Qwen Coder Plus

Alibaba model (not yet curated).

131K · in - · out -

Qwen Flash

Fast and very low cost with hybrid thinking. 1M context.

1M · in $0.05 · out $0.4

Qwen Max

Best quality of the stable commercial line. 32K context.

33K · in $1.6 · out $6.4

Qwen Plus

Balanced quality, speed, and cost with hybrid thinking. 1M context.

1M · in $0.4 · out $1.2

Qwen Turbo

Fastest and cheapest for simple tasks. 1M context.

1M · in $0.05 · out $0.2

Qwen Vl Max

Alibaba model (not yet curated).

131K · in - · out -

Qwen Vl Plus

Alibaba model (not yet curated).

131K · in - · out -

Qwen3 235B A22B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Qwen3 235b A22b Instruct 2507

Alibaba model (not yet curated).

131K · in - · out -

Qwen3 32B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Qwen3 32B (dense)

Qwen model via Converse API (Bedrock Foundation Model)

33K · in - · out -

Qwen3 Coder 30B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Qwen3 Coder 480B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Qwen3 Coder 480b A35b Instruct

Alibaba model (not yet curated).

131K · in - · out -

Qwen3 Coder Next

Qwen model via OpenAI-Compatible API (Bedrock Foundation Model)

262K · in - · out -

Qwen3 Coder Next Fp8

Qwen chat model. https://huggingface.co/api/models/Qwen/Qwen3-Coder-Next-FP8

262K · in $0.5 · out $1.2

Qwen3 Max Preview

Alibaba model (not yet curated).

131K · in - · out -

Qwen3 Next 80B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Qwen3 Next 80B A3B

Qwen model via Converse API (Bedrock Foundation Model)

262K · in - · out -

Qwen3 VL 235B

Qwen model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Qwen3 VL 235B A22B

Qwen model via Converse API (Bedrock Foundation Model)

262K · in - · out -

Qwen3 Vl Flash 2025 10 15

Alibaba model (not yet curated).

131K · in - · out -

Qwen3-Coder-30B-A3B-Instruct

Qwen model via Converse API (Bedrock Foundation Model)

262K · in - · out -

Qwen3.5 35B A3B Lora

Qwen chat model.

262K · in - · out -

Qwen3.5 Flash 2026 02 23

Alibaba model (not yet curated).

131K · in - · out -

Qwq Plus 2025 03 05

Alibaba model (not yet curated).

131K · in - · out -

Ternary Bonsai 27B

Prism ML chat model. https://huggingface.co/api/models/prism-ml/Ternary-Bonsai-…

262K · in - · out -

tiny aya earth

New Cohere Model

128K · in - · out -

tiny aya fire

New Cohere Model

128K · in - · out -

Tiny Aya Global

Tiny multilingual research model. 8K context. Text only.

8K · in - · out -

tiny aya water

New Cohere Model

128K · in - · out -

Twelvelabs TwelveLabs Marengo Embed 3.0 · US

Twelvelabs model via Converse API (Bedrock Inference Profile)

- · in - · out -

Twelvelabs TwelveLabs Marengo Embed v2.7 · US

Twelvelabs model via Converse API (Bedrock Inference Profile)

- · in - · out -

Twelvelabs TwelveLabs Pegasus v1.2 · Global

Twelvelabs model via Converse API (Bedrock Inference Profile)

- · in - · out -

Writer Palmyra Vision 7B

Writer model via OpenAI-Compatible API (Bedrock Foundation Model)

4K · in - · out -

Writer Palmyra X4 · US

Writer model via Converse API (Bedrock Inference Profile)

- · in - · out -

Writer Palmyra X5 · US

Writer model via Converse API (Bedrock Inference Profile)

- · in - · out -

Xai Grok 4.3

Xai model via OpenAI-Compatible API on AWS Bedrock Mantle

131K · in - · out -

Z.AI GLM 4.7 Flash

Z.AI model via OpenAI-Compatible API (Bedrock Foundation Model)

203K · in - · out -

Z.AI GLM 5

Z.AI model via OpenAI-Compatible API (Bedrock Foundation Model)

203K · in - · out -

Showing 663 of 663 models · prices in USD per 1M tokens · refreshed every 30 minutes

Common questions

FAQ

How many AI models does Big-AGI support?

Big-AGI connects to 800+ models across 29+ providers, including Anthropic, OpenAI, Google Gemini, SpaceXAI (formerly xAI), DeepSeek, Mistral, Perplexity, Groq, and OpenRouter. New models are usually added the day they ship. The index above lists the models currently tracked, with capabilities, context windows, and pricing.

How much do the models cost?

You bring your own provider API keys and pay standard provider rates. Big-AGI adds no markup on model usage and never limits your access. Prices above are in USD per 1M tokens, split into input and output. They cover text tokens only and do not include image, audio, or other multimodal input and output, or provider fees for requests, tools, caching, and other services. Prices can change and may be out of date, so always check the provider for current rates. The optional Pro plan ($10.99/mo) covers cross-device sync and does not restrict model usage.

What is the context window, and which model has the largest?

The context window is how much text a model can hold at once, measured in tokens and shown per model. Sort the Context column to rank every model from largest to smallest. Values change as providers ship new versions, so the table always reflects the current numbers.

Can I compare models side by side?

Yes. Filter and sort the table by vendor, context, price, or capability to compare specs directly. Inside Big-AGI, Beam runs 2 to 24 models in parallel on the same prompt, then compares or merges their answers, so you judge real output, not just specs.

Are my API keys and chats private?

Keys and chats are stored in your browser, local first. With Direct Connection on, requests go straight from your browser to the provider and never pass through the Big-AGI servers (requires a browser-side key and provider CORS support). The Big-AGI core is open source under the MIT license.

Can I run local or open-source models?

Yes. Big-AGI connects to local runtimes like LocalAI, Ollama, and LM Studio, and you can mix local and cloud models in the same chat. Any OpenAI-compatible endpoint is auto-detected by hostname.

Run any of these in Big-AGI.

Connect your own keys, run models side by side, then compare and merge the answers. Keys and chats stay in your browser.

Launch Big-AGI

Alibaba

Anthropic

AWS Bedrock

Azure

Cerebras

DeepSeek

Fireworks AI

Google Gemini

Groq

MiniMax

Mistral

Moonshot

OpenAI

OpenRouter

Perplexity

Sakana AI

SpaceXAI

Together AI

Z.ai

BIG-AGI

Product

Features Models Controls Changelog BEAM Technology

Resources

Documentation Discord GitHub

Company

Email Us Privacy Terms