Reference Guide

Mistral & DeepSeek

The two leading non-US open-weight model providers. Mistral AI from France — EU residency, frontier and edge models, Apache 2.0 weights. DeepSeek from China — massive Mixture-of-Experts architectures, aggressive pricing, and the V4 release that just shipped a 1M-token open-source frontier model.

← Back to Reference Hub

Mistral's flagship released as part of the Mistral 3 family in December 2025. Multimodal Mixture-of-Experts model with image understanding, available as both base and instruct under Apache 2.0 — a frontier-class open-weight model that any team can self-host or fine-tune.

$2.00 / 1M input, $6.00 / 1M output via La Plateforme — roughly 40% below GPT-5.4 and Claude Sonnet output rates
128K token context window
Multilingual (French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Chinese, Japanese, Korean, Arabic plus more) and natively multimodal
Apache 2.0 weights on Hugging Face — commercial use, fine-tuning, distillation all permitted
Available on Microsoft Azure AI Foundry, AWS Bedrock, Google Vertex AI, IBM watsonx, Snowflake Cortex, NVIDIA NIM

Limitations: 128K context lags GPT, Claude, and DeepSeek V4 (1M+). Self-hosting at full precision needs serious GPU infrastructure (4×H100 minimum for production). Reasoning benchmarks trail OpenAI o-series and DeepSeek R1.

FrontierApache 2.0

The middle tier in the Mistral 3 lineup. Designed as the everyday production workhorse — meaningfully cheaper than Large with most of the capability for general chat, RAG, and structured output workloads.

$0.40 / 1M input, $2.00 / 1M output — the sweet spot for production deployment
128K token context window
Function calling, JSON mode, and structured output
Available via La Plateforme, Azure AI Foundry, AWS Bedrock, and Vertex AI
Same multilingual coverage as Large 3

Limitations: Not always available as open weights — some Medium-tier releases ship API-only while community gets Small + Large. Confirm weight availability before assuming you can self-host.

Mid-Tier

The high-volume, edge-deployable model in the Mistral 3 family, refreshed in March 2026 as Small 4. Apache 2.0 licensed and small enough to run on a single consumer GPU — positioned as the open replacement for proprietary mid-tier models.

$0.15 / 1M input, $0.60 / 1M output via La Plateforme
128K token context window
Apache 2.0 weights — runs on a single H100 at full precision, or quantized on consumer hardware
Strong instruction-following relative to size; good fit for fine-tuning on domain data
Ideal for high-volume agent loops, classification, summarization, and embedded inference

Limitations: Multimodal capability varies by release; check the model card for image support. Reasoning quality is bounded — route hard problems to Large 3 or DeepSeek R1. Not a chat-by-default product; build your own UX or use Le Chat.

Small / EdgeApache 2.0

Mistral's code-completion specialist. 22B parameters — small enough for an RTX 4090 to run full-precision — with a 256K context window (twice the rest of the lineup). Trained on 80+ programming languages with strong fill-in-the-middle for IDE autocomplete.

$0.30 / 1M input, $0.90 / 1M output via Mistral's standard API
Free Codestral API endpoint for IDE-integration use cases
256K token context — the largest in the Mistral lineup
Native fill-in-the-middle (FIM) for autocomplete; code-specific tokenizer
Integrations with Cursor, Continue, Tabby, JetBrains AI Assistant

Limitations: Optimized for code completion, not general-purpose chat or reasoning — use Large or Medium for explanation and architectural discussion. Open weights are released under the Mistral Non-Production License (MNPL), which restricts commercial production use without a paid license; the Apache 2.0 family does not include Codestral by default.

Code ModelMNPL license

Mistral's dedicated vision-language line. Pixtral 12B (Sept 2024) is a small open-weight model for multimodal apps; Pixtral Large (Nov 2024) is a 124B vision-first model claiming to beat GPT-4o on chart and document interpretation. Both predate the Mistral 3 multimodal capabilities and remain available for vision-specific workloads.

Pixtral 12B — $0.10 / 1M input, $0.10 / 1M output, 128K context, 12B params, Apache 2.0
Pixtral Large — 124B params, 128K context, API-only with no public per-token pricing
Process screenshots, charts, diagrams, photos, and PDFs alongside text without an external pipeline
Pixtral 12B runs on a single A100 / H100 at full precision; great for self-hosted vision RAG

Limitations: Mistral Large 3 now ships native multimodal capability, partially overlapping Pixtral's positioning — pick Pixtral when you specifically want a smaller, vision-focused model or open weights for the 12B size. Pixtral Large's API-only access and lack of public pricing make budget forecasting hard.

Vision12B is Apache 2.0

Mistral's consumer-facing chat product, the European answer to ChatGPT. Free tier at chat.mistral.ai with web search, code interpreter, and image generation. Le Chat Enterprise ships in two variants: SaaS hosted by Mistral in France, or self-hosted on private cloud. As of April 2026, also distributed on AWS, Azure, and GCP marketplaces for enterprise procurement.

Le Chat (free) — Mistral Large by default, web search, image generation, document upload
Le Chat Pro (€14.99/mo) — higher rate limits, priority access
Le Chat Team / Enterprise — SSO, admin console, custom assistants, MCP connectors
Le Chat Enterprise can be deployed self-hosted, in private cloud (AWS / Azure / GCP), or as Mistral-hosted SaaS in France
Zero data retention enforceable on Enterprise; default consumer retention is 30 days

Limitations: Smaller ecosystem of plugins/integrations than ChatGPT or Claude. Le Chat reasoning quality is bounded by Mistral Large 3 — not as strong as GPT or Claude on hardest problems. Mobile apps shipped later than competitors.

Chat AppFree Tier

Mistral's developer platform — the equivalent of OpenAI's platform.openai.com. The structural pitch versus US providers is data residency: with OpenAI, Anthropic, or Google you opt into EU residency as an enterprise add-on; with Mistral, EU residency is the baseline and US routing is the opt-in.

Pay-as-you-go API access to all Mistral models including fine-tuning endpoints
EU-hosted by default; DPA available without enterprise upgrade
30-day token retention by default; zero retention available on Enterprise plan
Embedding models, OCR, function calling, JSON mode, batch API, structured output
Mistral Agents API for building agentic workflows with built-in tools (web search, code interpreter, image generation, document library)
SOC 2, ISO 27001, GDPR-aligned by design

Limitations: Smaller third-party ecosystem than OpenAI or Anthropic — fewer pre-built SDKs, eval tools, observability integrations. Latency from US-east is meaningfully higher than calling US-hosted providers; consider routing through hyperscaler marketplace endpoints (Azure / AWS / GCP) if your users are mostly American.

Developer APIEU residency default

The headline 2026 release — an open-source MoE family built from the ground up for million-token context as a default rather than a bolt-on. Two sizes: V4-Pro for frontier-grade work, V4-Flash for high-throughput production. Both shipped with Apache 2.0 weights on Hugging Face the same day they hit the API.

V4-Pro: 1.6T total params / 49B activated. $0.145 cache-hit input, $1.74 cache-miss input, $3.48 output per 1M tokens (list)
V4-Pro promo — active through 2026-05-31 15:59 UTC: 75% off, so effective rates today are $0.003625 cached input, $0.435 cache-miss input, $0.87 output per 1M tokens. A May-2026 deployment costs ~4x less than the list prices above; expect to revert to list from June 1.
V4-Flash: 284B total params / 13B activated. $0.028 cache-hit input, $0.14 cache-miss input, $0.28 output per 1M tokens
Native 1M token context window — not a sliding-window approximation
At 1M context, V4-Pro uses ~27% of the per-token FLOPs and 10% of the KV cache vs. V3.2; V4-Flash drops to ~10% FLOPs and 7% cache
Apache 2.0 open weights for commercial use; available on Hugging Face, Together, Fireworks, OpenRouter, and DeepSeek's own API

Limitations: Less than 24 hours old as of this guide — production benchmarks and third-party tooling support are still maturing. Self-hosting V4-Pro at the full 1.6T parameter count requires a serious GPU cluster; most teams will use it through DeepSeek's API or a hyperscaler. Cache-miss pricing is what most non-repetitive workloads pay; the headline cache-hit prices assume meaningful prompt repetition.

FrontierApache 2.0Just released

The previous-generation flagship that established DeepSeek's reputation. 671B total parameters with 37B activated per token. Trained on 14.8T tokens, refined across V3.1 and V3.2 releases through 2025. Still the safest production choice if you don't want to be the first to deploy V4.

Mixture-of-Experts with Multi-Head Latent Attention (MLA) for efficient KV caching
128K token context window in standard deployments (up to 1M in research builds)
Apache 2.0 open weights on Hugging Face
Wide third-party support — Together, Fireworks, OpenRouter, Groq, Cerebras, NVIDIA NIM
Strong general-purpose performance; well-understood failure modes after 18 months in production

Limitations: Now superseded by V4 on capability and context length. Pricing on DeepSeek's own API has been unified into the V4 lineup; some third-party hosts still serve V3.2 at the older rates ($0.27 / $1.10 per 1M historically). Useful as a fallback model in router setups.

ProductionApache 2.0

The first frontier-grade open reasoning model. R1 introduced large-scale reinforcement learning over chain-of-thought traces, producing OpenAI o1-class results on math, code, and science benchmarks — while shipping the weights publicly. The release reset industry expectations for what open models could do on reasoning workloads.

Same MoE architecture as V3 (671B / 37B activated)
Generates extensive visible reasoning traces before final answers — latency is higher than V3 / V4-Flash but answers are demonstrably better on hard problems
Distilled smaller variants (1.5B, 7B, 8B, 14B, 32B, 70B) released as open weights for self-hosting
Code released MIT, model weights released under DeepSeek's permissive Model License (commercial use allowed with use-case restrictions on illegal/harmful content)
Available on DeepSeek API and every major third-party host

Limitations: Slower and more expensive per query than non-reasoning models — only use it when the task actually benefits from extended thinking. Visible chain-of-thought can be a problem for end-user UX; many apps suppress it. R2 is rumored but unreleased as of April 2026.

ReasoningOpen weights

DeepSeek's dedicated code model, trained from scratch on 6T tokens with 338 programming languages represented. Coder V2 ships in two sizes — a 236B MoE and a 16B MoE Lite — both with open weights and commercial-use rights.

Coder V2: 236B total / 21B activated, 128K context, frontier-grade benchmarks on HumanEval, MBPP, LiveCodeBench
Coder V2 Lite: 16B total / 2.4B activated, runs locally on consumer GPUs
Code MIT-licensed; model weights under DeepSeek's commercial-use Model License
Strong fill-in-the-middle, repo-level reasoning, and multi-file refactor performance
Available via DeepSeek API, Hugging Face, Ollama, LM Studio, and major inference hosts

Limitations: Many teams now use DeepSeek V4-Flash for general code work and reserve Coder V2 for repository-scale reasoning or self-hosted IDE integrations. Smaller community-maintained tooling than Codestral has inside the JetBrains/VS Code ecosystem.

Code ModelOpen weights

DeepSeek's first-party API at api.deepseek.com. The pricing strategy — aggressive headline rates plus context-cache discounts plus off-peak windows — is the cheapest way to access frontier-grade models for many workloads. OpenAI-compatible endpoints make migration mostly a base-URL change.

OpenAI-compatible REST API — drop-in for the OpenAI Python / Node SDKs by changing the base URL
Context caching applies a 90% discount automatically when input tokens are served from cache (e.g. repeated system prompts in agent loops)
Off-peak window: 16:30–00:30 GMT, 50–75% discount on regular rates — useful for batch / async workloads
Function calling, JSON mode, streaming, and the standard OpenAI feature set
Hosted in China — latency from the US/EU is meaningfully higher than US-hosted providers; expect 200–400ms first-token latency from the US

Limitations: China-hosted infrastructure raises real data-governance questions for regulated industries and US enterprise procurement — expect security-review headwinds. Use a hyperscaler-hosted endpoint (AWS Bedrock Marketplace, Together, Fireworks, OpenRouter) when residency matters. No SLA tiers comparable to OpenAI / Anthropic enterprise.

Developer APIChina-hosted

Capability	Mistral AI	DeepSeek
Headquartered in	Paris, France (EU)	Hangzhou, China
Default API hosting	EU (France)	China
Hyperscaler marketplace availability	AWS, Azure, GCP, Snowflake, IBM, NVIDIA NIM	AWS Bedrock Marketplace, Together, Fireworks, OpenRouter
GDPR / EU enterprise procurement	Built for it	Use via hyperscaler proxy
Frontier model open weights	Mistral Large 3 (Apache 2.0)	DeepSeek V4 (Apache 2.0)
Smaller / edge open weights	Small 4, Pixtral 12B, Ministral 8B (Apache 2.0)	R1 distills (1.5B–70B), Coder V2 Lite
Reasoning model open weights	Reasoning variants in Mistral 3 family	DeepSeek R1 (Model License)
Code model open weights	Codestral (MNPL — non-production)	Coder V2 (commercial use)
License permissiveness	Mostly Apache 2.0	Apache 2.0 (V4) + custom Model License (R1, Coder)
Frontier model parameters	Mistral Large 3 (size undisclosed)	V4-Pro 1.6T total / 49B activated
Architecture	Dense + MoE variants	MoE with Multi-Head Latent Attention
Standard context window	128K (256K for Codestral)	1M (V4), 128K (V3, R1, Coder)
Multimodal / vision	Native in Large 3, Pixtral line	Text-only as of V4
Multilingual coverage	12+ EU and global languages	Strong English & Chinese, weaker low-resource
OpenAI-compatible endpoints	Yes (compatibility mode)	Yes (drop-in base URL change)
Function calling / tool use	Yes	Yes
JSON / structured output	Yes	Yes
Context caching discount	No	90% off cached input
Off-peak pricing	No	50–75% off, 16:30–00:30 GMT
Batch API	Yes	No native batch
Built-in agent tools (web, code interp)	Mistral Agents API	BYO orchestration
Fine-tuning endpoints	Yes (managed)	Self-hosted only
Default residency	EU	China
US residency option	Opt-in (US routing)	Via AWS Bedrock / hyperscalers
Default token retention	30 days (consumer)	Check current ToS
Zero-retention available	Yes (Enterprise)	Via hyperscaler
SOC 2 / ISO 27001	Yes	Not directly; via hosts
Self-hosted product (chat UI)	Le Chat Enterprise	No first-party self-hosted UI
Private cloud deployment	Le Chat Enterprise on AWS / Azure / GCP	Self-host the open weights
Commercial support / SLA	Enterprise tiers + DPA	Limited; via hyperscaler hosts

The cache and off-peak math on DeepSeek: DeepSeek's headline-cheap rates assume two things many workloads don't actually have. Cache hits apply only to input tokens served from context cache — great for repeated system prompts in agent loops, irrelevant for one-shot user queries. The 50–75% off-peak discount is fixed at 16:30–00:30 GMT, useful for batch jobs but not interactive products. For typical interactive workloads, expect to pay close to the cache-miss price during peak hours. Codestral licensing gotcha: Codestral is open-weight but ships under the Mistral Non-Production License (MNPL), which restricts commercial production use. The rest of the Mistral 3 family (Large 3, Small 4, Pixtral 12B, Ministral 8B) is Apache 2.0. If you need an Apache-2.0 code model, prefer DeepSeek Coder V2 or just use Mistral Small 4 / Codestral via the API where licensing applies to the service rather than the weights. Context window reality: DeepSeek V4's 1M context is a real architectural choice, not a sliding-window approximation — KV cache and FLOPs were optimized around it. Mistral's 128K standard ceiling is generous for most RAG and agent work but lags on long-document and multi-repo code workloads where DeepSeek V4 (or DeepSeek Coder V2 with future scale-out) wins. Jurisdiction note: If your buyer's security review can't accept China-hosted endpoints, route DeepSeek calls through AWS Bedrock Marketplace, Together, Fireworks, or OpenRouter — the same Apache 2.0 weights, hosted in US/EU regions, with the host's compliance posture instead of DeepSeek's.

EU customer with strict GDPR / data-residency requirementsMistral (La Plateforme)

Cheapest possible API for high-volume agent loopsDeepSeek V4-Flash

Best open-weight frontier model for self-hosting todayDeepSeek V4-Pro or Mistral Large 3

Million-token context for long-document analysisDeepSeek V4

Hard reasoning, math, or scientific problem solvingDeepSeek R1

Multimodal (image + text) reasoning with open weightsMistral Large 3 or Pixtral

Code completion in IDE on self-hosted hardwareCodestral or DeepSeek Coder V2

Apache 2.0 code model for commercial productionDeepSeek Coder V2

Multilingual app covering EU languages wellMistral Large 3 or Small 4

Self-hosted chat UI for an enterpriseLe Chat Enterprise

Edge / on-device inference, fully openMistral Small 4 or Ministral 8B

US enterprise that can't accept China-hosted endpointsMistral, or DeepSeek via Bedrock

Batch overnight processing on a tight budgetDeepSeek (off-peak window)

Agent with massive repeated system promptDeepSeek (cache-hit pricing)

Need managed fine-tuning without standing up GPUsMistral La Plateforme

Built-in web search + code interpreter for an agentMistral Agents API

Drop-in replacement for an OpenAI-compatible clientDeepSeek API

Avoid API lock-in — weights you can run anywhereEither, both ship Apache 2.0

Distill a smaller reasoning model for a niche domainDeepSeek R1 distills (1.5B–70B)

Vision-RAG over PDFs and screenshotsPixtral 12B (self-host) or Mistral Large 3 (API)

Our Recommendation

Treat Mistral and DeepSeek as complementary, not alternatives. Mistral wins on jurisdiction (EU residency by default), enterprise procurement (Le Chat Enterprise, hyperscaler marketplaces, managed fine-tuning), multilingual coverage, and native multimodal. DeepSeek wins on raw price-per-token, 1M context windows, and open reasoning. The sharpest 2026 stack uses Mistral Small 4 or Medium 3 for default workloads where EU residency and managed APIs matter, DeepSeek V4-Flash for high-volume cost-sensitive backends, and DeepSeek R1 (or future R2) when a problem actually needs extended reasoning. Self-host either's open weights when you want to escape per-token billing entirely.

Mistral & DeepSeek

Mistral Large 3

Mistral Medium 3

Mistral Small 4

Codestral

Pixtral 12B & Pixtral Large

Le Chat

La Plateforme & EU Data Residency

DeepSeek V4 (Pro & Flash)

DeepSeek V3 / V3.2

DeepSeek R1

DeepSeek Coder V2

DeepSeek API Platform

Our Recommendation