Reference Guide

Meta AI

Llama models, the Meta AI assistant across Facebook/Instagram/WhatsApp, AI Studio, the Llama API, and the new closed-weight Muse Spark — what each one is and where it fits.

← Back to Reference Hub

Best for: Long-context document work, multimodal pipelines, and self-hosted production deployments where open weights matter.

Scout — 17B active / 109B total params, 16 experts, 10M-token context (largest open model at launch)
Maverick — 17B active / 400B total params, 128 experts, 1M-token context
Behemoth — ~288B active / ~2T total, used as a teacher model for codistillation; never publicly released
First Llama family with Mixture-of-Experts architecture and native multimodality (text + image input)
Available on Hugging Face, llama.com, Bedrock, Vertex, Azure; runs in vLLM, TGI, llama.cpp, Ollama
Released April 5, 2025 under the Llama 4 Community License

Limitations: Lukewarm reception vs Qwen3 and DeepSeek R1 on reasoning benchmarks. License caps commercial use at 700M monthly active users (effectively a hyperscaler clause). Behemoth never shipped, and the original AGI Foundations team that built Llama 4 was dissolved after release.

Open-Weight ModelsFree Weights

Best for: The mature, broadly-deployed workhorse for most production Llama usage in 2026 — cheap, fast, well-supported by every inference host.

Llama 3.1 — 8B, 70B, and 405B dense models with 128K context
Llama 3.2 — 1B/3B text + 11B/90B vision (first multimodal Llama, edge-friendly small sizes)
Llama 3.3 70B — dense model approaching 405B quality at a much smaller size
Lowest-cost tier: Llama 3.1 8B at ~$0.02 / $0.05 per million tokens on hosted providers
Supported on Bedrock, Vertex, Azure AI, Together, Fireworks, Groq, Replicate, and self-hosting
Strong open ecosystem of fine-tunes, derivatives, and quantizations

Limitations: Pre-MoE architecture, so larger sizes are heavier to serve than Llama 4. English-dominant training data. Reasoning lags behind closed frontier models and behind Qwen3 / DeepSeek R1 on many evals.

Open-Weight ModelsFree Weights

Best for: The first product of Meta Superintelligence Labs (launched April 8, 2026) and the new engine behind the consumer Meta AI assistant. Optimized for visual understanding and "personal superintelligence" use cases.

Strong multimodal/visual perception — designed to "see and understand what you're looking at"
Powers Meta AI in the standalone app, meta.ai web, and is rolling into Instagram, WhatsApp, Messenger, Facebook, and Ray-Ban / Oakley Meta glasses
First Meta flagship ever shipped without open weights — a sharp pivot from the Llama strategy
Available only via private API preview to select partners
Marks the operational debut of MSL under Alexandr Wang (Chief AI Officer) and Nat Friedman

Limitations: Closed-source with no published model card or parameter count. No public API at launch. No third-party benchmarks vs GPT-5 / Claude / Gemini. Open-weight versions of future MSL models are promised but unconfirmed.

Closed Frontier ModelPrivate Preview

Best for: A free, ubiquitous consumer assistant inside the apps people already use. Distribution is the moat — this assistant lives where billions of users already chat.

Chat, real-time web answers, and image understanding from photos
Standalone Meta AI app and meta.ai web for full-screen conversations
Embedded in Facebook, Instagram, WhatsApp, and Messenger search bars and DMs
Imagine image generation (free, in-chat or at imagine.meta.com)
Ray-Ban Meta and Oakley Meta glasses integration: visual Q&A, translation, navigation, photo-based nutrition
Reels, posts, and creator content woven into answers with attribution
Now powered by Muse Spark (rolling out April 2026)

Limitations: Country availability uneven — full features US-first, EU rollout slowed by DMA compliance. Image generation quality below Midjourney and DALL·E for stylized work. Cross-app memory still rolling out.

Consumer AssistantFree

Best for: No-code custom AI characters distributed natively into Instagram, Messenger, and WhatsApp — Meta's answer to OpenAI's custom GPTs, but living inside social apps instead of a chat UI.

Custom AI Characters — available to any user; build a persona with name, look, personality, and topic boundaries
Creator AI — for Instagram creators; auto-replies to DMs and story replies in the creator's voice
Templates for trivia hosts, cooking teachers, travel guides, fitness coaches, etc.
Configure entirely in plain text — no coding required
Distributes natively into Instagram chat, Messenger, and WhatsApp
Built on Llama under the hood

Limitations: US-only as of April 2026. Not all Instagram accounts have access yet. No real revenue-share model for creators. Some of Meta's own AI persona launches drew criticism in 2025, and content moderation on third-party characters is still a question mark.

Custom Agent BuilderFreeUS-Only

Best for: First-party hosted Llama inference from Meta itself — previewed at LlamaCon (April 29, 2025) as a direct competitor to OpenAI/Anthropic APIs and to third-party Llama hosts like Together, Fireworks, and Groq.

One-click API key creation and an interactive playground for Scout and Maverick
Python and TypeScript SDKs
OpenAI-SDK-compatible — drop-in for code already using openai client libraries
Hosted fine-tuning (LoRA and full) on Llama 3.3 8B with eval tooling included
Take-your-weights-out: tuned models are portable to any host — no lock-in
Meta does not train on prompts or responses

Limitations: Limited free preview throughout 2025-26; paid GA pricing not yet finalized. Smaller model selection than Together / Fireworks / Groq today. Fine-tuning catalog narrower than AWS Bedrock. For production traffic, third-party Llama hosts often still win on price, throughput, and region coverage.

Hosted InferencePreview

Best for: Meta's video generation research — not a product you can call directly today, but the underlying tech that's seeding video tools inside Instagram and Reels.

30B-parameter text-to-video research model announced October 2024
Up to 16-second 1080p HD video with synchronized audio
Text-to-video, image-to-video, and personalized video (your face)
Video editing via plain-text instructions
Rolling into Instagram and Reels creator features through 2025-26

Limitations: Not publicly released as weights or as an API. Behind Veo 3 and Runway in market access (OpenAI discontinued Sora on April 26, 2026 — API ends September 24, 2026). No fixed launch timeline for direct access.

Video GenerationResearch Only

Best for: The infrastructure layer that the entire AI industry runs on — originally Meta-built, now governed by the independent PyTorch Foundation under the Linux Foundation. Meta remains the largest contributor.

PyTorch — the dominant deep learning framework; powers Llama, most Hugging Face models, and historically OpenAI training infrastructure
torchtune — native post-training and fine-tuning library
torchao — quantization and model optimization
ExecuTorch — on-device inference for mobile and edge
BSD-licensed, fully open-source

Limitations: When people argue about Meta's "open-source AI" strategy, PyTorch is the most consequential piece — far more so than the Llama license itself. It's the substrate for the entire ecosystem.

ML FrameworkOpen Source

Best for: Targeted, single-purpose open-weight models that complement the main Llama line.

Llama Guard — open-weight safety classifier for input/output moderation; widely used in production Llama deployments
Code Llama — Llama 2 code-specialized derivative from 2023; effectively superseded by general-purpose Llama 3.x and Llama 4 on code tasks; no recent updates
SeamlessM4T — open-weight speech-to-speech and speech-to-text translation model covering ~100 languages

Limitations: Code Llama is essentially deprecated in 2026 — reach for Llama 3.3 70B or Llama 4 Maverick instead. SeamlessM4T is still maintained but moves slower than the main Llama line.

Specialized ModelsFree Weights

Capability	Llama 3.1 8B	Llama 3.3 70B	Llama 4 Scout	Llama 4 Maverick	Muse Spark
Open weights	Yes	Yes	Yes	Yes	Closed
Architecture	Dense	Dense	MoE (16 experts)	MoE (128 experts)	Undisclosed
Active / total params	8B / 8B	70B / 70B	17B / 109B	17B / 400B	Undisclosed
Context window	128K	128K	10M	1M	Undisclosed
Multimodal (image input)	No	No	Native	Native	Native (visual-first)
Cost (hosted, per 1M tok)	~$0.02 / $0.05	$0.23 / $0.40 (DeepInfra, cheapest tier) — $0.59 / $0.79 (Groq, fastest inference)	Mid-tier	~$0.15 / $0.60	Not public
Self-host friendly	Easy (single GPU)	Mid-size cluster	Multi-GPU	Heavy infra	No
Edge / on-device	Yes (quantized)	No	No	No	No
Reasoning quality	Basic	Good	Good	Strong	Unbenchmarked
Long-document analysis	Limited	Good	Excellent (10M ctx)	Excellent	Unknown
Fine-tuning support	Universal	Universal	Most providers	Most providers	No
Available via Llama API	Yes	Yes (with fine-tuning)	Yes	Yes	No
Available on Bedrock / Vertex / Azure	All three	All three	All three	All three	No
Powers Meta AI assistant	No	No	No	Previously	Yes (current)
Commercial use license	Llama 3.1 CL	Llama 3.3 CL	Llama 4 CL (700M MAU cap)	Llama 4 CL (700M MAU cap)	Meta-controlled

700M MAU cap: The Llama Community License lets any company use the weights commercially unless the licensee has more than 700 million monthly active users in the calendar month before release — effectively a "no Google, no Apple, no ByteDance" clause. For everyone else, treat Llama as commercially open. Hosted vs self-hosted: For most teams, the right call is to consume Llama through Bedrock, Vertex, Together, Fireworks, or Groq rather than self-host — the open weights are the insurance policy, not the deployment plan.

Cheapest acceptable LLM for high-volume productionLlama 3.1 8B (hosted)

General-purpose dense workhorse for production agentsLlama 3.3 70B

Analyzing huge contracts or full codebases in one promptLlama 4 Scout (10M context)

Strongest open Llama for complex reasoningLlama 4 Maverick

Image + text reasoning on open weightsLlama 4 Scout / Maverick

On-device chat for a mobile appLlama 3.2 1B / 3B

Vision tasks on a small open-weight modelLlama 3.2 11B Vision

Fine-tuning a model and keeping the weights portableLlama API (take-your-weights-out)

Drop-in OpenAI-compatible client, but on LlamaLlama API

Lowest-latency Llama inference (token streaming)Groq (Llama 3.x)

Enterprise Llama with VPC, IAM, and data residencyAWS Bedrock or Vertex AI

Free consumer chat inside WhatsApp / InstagramMeta AI Assistant

Visual Q&A on the go without pulling out a phoneRay-Ban / Oakley Meta + Meta AI

Free image generation in a chat threadImagine (Meta AI)

Building a custom AI character for Instagram fansMeta AI Studio (Creator AI)

No-code branded chatbot inside MessengerMeta AI Studio

Safety classifier for moderating LLM I/OLlama Guard

Open-source speech-to-speech translationSeamlessM4T

Frontier closed model from MetaMuse Spark (private preview only)

Training or fine-tuning infrastructurePyTorch + torchtune

Our Recommendation

Treat Meta AI as two separate stories. The open-weight story — Llama 3.x and Llama 4 — is the one that matters for builders: cheap inference, full control, and a healthy multi-cloud ecosystem (Bedrock, Vertex, Together, Fireworks, Groq). For most production work, default to Llama 3.3 70B for general agents, Llama 4 Scout when context length is the bottleneck, and Llama 3.1 8B when cost is. The consumer story — Meta AI assistant, AI Studio, and now Muse Spark — is mainly about distribution: it reaches billions of users inside apps they already use, but it's a closed product surface, not a build-on platform. Watch Muse Spark closely — it's the signal that Meta's "open by default" era is ending.

Meta AI

Llama 4

Llama 3.x

Muse Spark

Meta AI Assistant

Meta AI Studio

Llama API

Movie Gen

PyTorch & Open Toolchain

Specialized Models

Our Recommendation