Meta AI
Llama models, the Meta AI assistant across Facebook/Instagram/WhatsApp, AI Studio, the Llama API, and the new closed-weight Muse Spark — what each one is and where it fits.
← Back to Reference HubBest for: Long-context document work, multimodal pipelines, and self-hosted production deployments where open weights matter.
- Scout — 17B active / 109B total params, 16 experts, 10M-token context (largest open model at launch)
- Maverick — 17B active / 400B total params, 128 experts, 1M-token context
- Behemoth — ~288B active / ~2T total, used as a teacher model for codistillation; never publicly released
- First Llama family with Mixture-of-Experts architecture and native multimodality (text + image input)
- Available on Hugging Face, llama.com, Bedrock, Vertex, Azure; runs in vLLM, TGI, llama.cpp, Ollama
- Released April 5, 2025 under the Llama 4 Community License
Limitations: Lukewarm reception vs Qwen3 and DeepSeek R1 on reasoning benchmarks. License caps commercial use at 700M monthly active users (effectively a hyperscaler clause). Behemoth never shipped, and the original AGI Foundations team that built Llama 4 was dissolved after release.
Best for: The mature, broadly-deployed workhorse for most production Llama usage in 2026 — cheap, fast, well-supported by every inference host.
- Llama 3.1 — 8B, 70B, and 405B dense models with 128K context
- Llama 3.2 — 1B/3B text + 11B/90B vision (first multimodal Llama, edge-friendly small sizes)
- Llama 3.3 70B — dense model approaching 405B quality at a much smaller size
- Lowest-cost tier: Llama 3.1 8B at ~$0.02 / $0.05 per million tokens on hosted providers
- Supported on Bedrock, Vertex, Azure AI, Together, Fireworks, Groq, Replicate, and self-hosting
- Strong open ecosystem of fine-tunes, derivatives, and quantizations
Limitations: Pre-MoE architecture, so larger sizes are heavier to serve than Llama 4. English-dominant training data. Reasoning lags behind closed frontier models and behind Qwen3 / DeepSeek R1 on many evals.
Best for: The first product of Meta Superintelligence Labs (launched April 8, 2026) and the new engine behind the consumer Meta AI assistant. Optimized for visual understanding and "personal superintelligence" use cases.
- Strong multimodal/visual perception — designed to "see and understand what you're looking at"
- Powers Meta AI in the standalone app, meta.ai web, and is rolling into Instagram, WhatsApp, Messenger, Facebook, and Ray-Ban / Oakley Meta glasses
- First Meta flagship ever shipped without open weights — a sharp pivot from the Llama strategy
- Available only via private API preview to select partners
- Marks the operational debut of MSL under Alexandr Wang (Chief AI Officer) and Nat Friedman
Limitations: Closed-source with no published model card or parameter count. No public API at launch. No third-party benchmarks vs GPT-5 / Claude / Gemini. Open-weight versions of future MSL models are promised but unconfirmed.
Best for: A free, ubiquitous consumer assistant inside the apps people already use. Distribution is the moat — this assistant lives where billions of users already chat.
- Chat, real-time web answers, and image understanding from photos
- Standalone Meta AI app and meta.ai web for full-screen conversations
- Embedded in Facebook, Instagram, WhatsApp, and Messenger search bars and DMs
- Imagine image generation (free, in-chat or at imagine.meta.com)
- Ray-Ban Meta and Oakley Meta glasses integration: visual Q&A, translation, navigation, photo-based nutrition
- Reels, posts, and creator content woven into answers with attribution
- Now powered by Muse Spark (rolling out April 2026)
Limitations: Country availability uneven — full features US-first, EU rollout slowed by DMA compliance. Image generation quality below Midjourney and DALL·E for stylized work. Cross-app memory still rolling out.
Best for: No-code custom AI characters distributed natively into Instagram, Messenger, and WhatsApp — Meta's answer to OpenAI's custom GPTs, but living inside social apps instead of a chat UI.
- Custom AI Characters — available to any user; build a persona with name, look, personality, and topic boundaries
- Creator AI — for Instagram creators; auto-replies to DMs and story replies in the creator's voice
- Templates for trivia hosts, cooking teachers, travel guides, fitness coaches, etc.
- Configure entirely in plain text — no coding required
- Distributes natively into Instagram chat, Messenger, and WhatsApp
- Built on Llama under the hood
Limitations: US-only as of April 2026. Not all Instagram accounts have access yet. No real revenue-share model for creators. Some of Meta's own AI persona launches drew criticism in 2025, and content moderation on third-party characters is still a question mark.
Best for: First-party hosted Llama inference from Meta itself — previewed at LlamaCon (April 29, 2025) as a direct competitor to OpenAI/Anthropic APIs and to third-party Llama hosts like Together, Fireworks, and Groq.
- One-click API key creation and an interactive playground for Scout and Maverick
- Python and TypeScript SDKs
- OpenAI-SDK-compatible — drop-in for code already using openai client libraries
- Hosted fine-tuning (LoRA and full) on Llama 3.3 8B with eval tooling included
- Take-your-weights-out: tuned models are portable to any host — no lock-in
- Meta does not train on prompts or responses
Limitations: Limited free preview throughout 2025-26; paid GA pricing not yet finalized. Smaller model selection than Together / Fireworks / Groq today. Fine-tuning catalog narrower than AWS Bedrock. For production traffic, third-party Llama hosts often still win on price, throughput, and region coverage.
Best for: Meta's video generation research — not a product you can call directly today, but the underlying tech that's seeding video tools inside Instagram and Reels.
- 30B-parameter text-to-video research model announced October 2024
- Up to 16-second 1080p HD video with synchronized audio
- Text-to-video, image-to-video, and personalized video (your face)
- Video editing via plain-text instructions
- Rolling into Instagram and Reels creator features through 2025-26
Limitations: Not publicly released as weights or as an API. Behind Veo 3 and Runway in market access (OpenAI discontinued Sora on April 26, 2026 — API ends September 24, 2026). No fixed launch timeline for direct access.
Best for: The infrastructure layer that the entire AI industry runs on — originally Meta-built, now governed by the independent PyTorch Foundation under the Linux Foundation. Meta remains the largest contributor.
- PyTorch — the dominant deep learning framework; powers Llama, most Hugging Face models, and historically OpenAI training infrastructure
- torchtune — native post-training and fine-tuning library
- torchao — quantization and model optimization
- ExecuTorch — on-device inference for mobile and edge
- BSD-licensed, fully open-source
Limitations: When people argue about Meta's "open-source AI" strategy, PyTorch is the most consequential piece — far more so than the Llama license itself. It's the substrate for the entire ecosystem.
Best for: Targeted, single-purpose open-weight models that complement the main Llama line.
- Llama Guard — open-weight safety classifier for input/output moderation; widely used in production Llama deployments
- Code Llama — Llama 2 code-specialized derivative from 2023; effectively superseded by general-purpose Llama 3.x and Llama 4 on code tasks; no recent updates
- SeamlessM4T — open-weight speech-to-speech and speech-to-text translation model covering ~100 languages
Limitations: Code Llama is essentially deprecated in 2026 — reach for Llama 3.3 70B or Llama 4 Maverick instead. SeamlessM4T is still maintained but moves slower than the main Llama line.
| Capability | Llama 3.1 8B | Llama 3.3 70B | Llama 4 Scout | Llama 4 Maverick | Muse Spark |
|---|---|---|---|---|---|
| Open weights | Yes | Yes | Yes | Yes | Closed |
| Architecture | Dense | Dense | MoE (16 experts) | MoE (128 experts) | Undisclosed |
| Active / total params | 8B / 8B | 70B / 70B | 17B / 109B | 17B / 400B | Undisclosed |
| Context window | 128K | 128K | 10M | 1M | Undisclosed |
| Multimodal (image input) | No | No | Native | Native | Native (visual-first) |
| Cost (hosted, per 1M tok) | ~$0.02 / $0.05 | $0.23 / $0.40 (DeepInfra, cheapest tier) — $0.59 / $0.79 (Groq, fastest inference) | Mid-tier | ~$0.15 / $0.60 | Not public |
| Self-host friendly | Easy (single GPU) | Mid-size cluster | Multi-GPU | Heavy infra | No |
| Edge / on-device | Yes (quantized) | No | No | No | No |
| Reasoning quality | Basic | Good | Good | Strong | Unbenchmarked |
| Long-document analysis | Limited | Good | Excellent (10M ctx) | Excellent | Unknown |
| Fine-tuning support | Universal | Universal | Most providers | Most providers | No |
| Available via Llama API | Yes | Yes (with fine-tuning) | Yes | Yes | No |
| Available on Bedrock / Vertex / Azure | All three | All three | All three | All three | No |
| Powers Meta AI assistant | No | No | No | Previously | Yes (current) |
| Commercial use license | Llama 3.1 CL | Llama 3.3 CL | Llama 4 CL (700M MAU cap) | Llama 4 CL (700M MAU cap) | Meta-controlled |
Our Recommendation
Treat Meta AI as two separate stories. The open-weight story — Llama 3.x and Llama 4 — is the one that matters for builders: cheap inference, full control, and a healthy multi-cloud ecosystem (Bedrock, Vertex, Together, Fireworks, Groq). For most production work, default to Llama 3.3 70B for general agents, Llama 4 Scout when context length is the bottleneck, and Llama 3.1 8B when cost is. The consumer story — Meta AI assistant, AI Studio, and now Muse Spark — is mainly about distribution: it reaches billions of users inside apps they already use, but it's a closed product surface, not a build-on platform. Watch Muse Spark closely — it's the signal that Meta's "open by default" era is ending.