Reference Guide

Hugging Face

The Model Hub, Datasets, Spaces, Inference Providers, Inference Endpoints, Transformers, AutoTrain, Pro, and Enterprise Hub — what each one does and where it fits in an open-source AI stack.

← Back to Reference Hub

Best for: Discovering, downloading, versioning, and sharing open-weight machine learning models — the de facto registry for the open-source AI ecosystem.

Over 2.4 million models as of early 2026, spanning text, vision, audio, multimodal, tabular, and reinforcement learning
Each repo is backed by Git + Git LFS — full version history, branches, pull requests, and diffs on weights and configs
Standardized model cards document training data, intended use, evaluation, and limitations — baseline transparency the closed providers don't match
Tag-based discovery: filter by task, library, language, license, parameter count, and benchmarks; "Trending" and leaderboards highlight what's working
One-line download with huggingface_hub client, transformers, diffusers, or any framework that speaks HF's repo format
Free public hosting; private repos available on Pro, Team, and Enterprise plans with per-seat storage quotas

Limitations: Quality varies wildly — community uploads include experimental forks, broken merges, and abandoned projects alongside production-grade models. Licenses are model-specific and need to be read; "open weights" does not always mean commercial-use friendly. Large model downloads can be slow without a paid CDN or mirror.

Model RegistryFree

Best for: Hosting, exploring, and loading the training and evaluation data that powers ML projects — the dataset companion to the Model Hub.

Over 730,000 datasets ranging from canonical benchmarks (Common Crawl, Wikipedia, ImageNet variants) to niche fine-tuning sets
Dataset Viewer renders a browsable preview of the first rows in your browser — works on Parquet, JSONL, CSV, image, and audio files
datasets Python library streams data lazily so you can train on terabyte-scale corpora without local disk space
Built-in support for splits, configurations, and standardized metadata; works hand-in-hand with transformers training pipelines
Pro unlocks Dataset Viewer on private datasets, which is otherwise public-only
Free public hosting; private dataset storage counts against your plan's private-storage quota

Limitations: Like the Model Hub, dataset quality and licensing are uneven — verify provenance and license before training a commercial model. Some "datasets" are scraped from the web with unclear copyright status. Streaming is fast for sequential reads, slower for random access.

Dataset HubFree

Best for: Hosting interactive demos, prototypes, and lightweight apps for ML models — the place where new open models show up running in a browser tab the day they release.

Over 1 million Spaces hosted across Gradio, Streamlit, Docker, and static-site SDKs
ZeroGPU (free) dynamically allocates NVIDIA H200 GPUs per request — so demos can run real models without a paid GPU upgrade
Pro accounts get 8x the daily ZeroGPU quota (up to 25 minutes of H200 compute/day), highest queue priority, and Dev Mode (SSH/VS Code into the running Space)
Beyond included quota, all paid users can buy ZeroGPU time at $1 per 10 minutes of GPU time
Persistent paid GPU upgrades available for always-on workloads (T4, A10, L4, A100, H100 tiers, billed per minute)
Built-in OAuth ("Sign in with Hugging Face"), secrets management, and one-click duplicate from any public Space

Limitations: ZeroGPU has cold-start latency on the first request after idle. Free CPU Spaces sleep after inactivity. Container builds can be slow; iteration is fastest when you keep the Docker image small. Not a substitute for a real production hosting platform — rate limits and shared infrastructure mean you'll want Inference Endpoints for paying customer traffic.

App HostingFree Tier (ZeroGPU)

Best for: Calling open-weight models from your code without standing up infrastructure — a unified, OpenAI-compatible interface routed across multiple inference providers.

Replaces the older "Serverless Inference API"; routes requests across partner providers (Together, Fireworks, Replicate, Scaleway, SambaNova, Hyperbolic, etc.) plus Hugging Face's own infrastructure
Free tier: 100,000 monthly inference credits applied automatically when you route through Hugging Face
Pro: 2 million monthly credits (20x the free tier) at $9/month
Beyond included credits, you pay per request based on compute time x underlying hardware price
Or set up "Bring Your Own Key": connect your existing Together/Replicate/etc. account and Hugging Face just orchestrates the routing
Client SDKs in Python and JavaScript; an OpenAI-compatible HTTP endpoint makes drop-in replacement easy

Limitations: Not every Model Hub model is available as a hosted endpoint; coverage skews toward popular open models. Throughput and latency depend on which provider routes your request — quality of service is partner-dependent. Production-critical workloads with strict SLA requirements belong on Inference Endpoints, not the serverless tier.

Serverless InferenceFree Credits

Best for: Production deployments of any Hub model on dedicated, autoscaling infrastructure — the path from prototype to real customer traffic without writing your own serving stack.

Pick any model from the Hub, choose a region (AWS, Azure, GCP) and instance type, and HF spins up a managed endpoint behind a private URL
Pricing starts at $0.03 per CPU core/hour and $0.50 per GPU/hour, billed by the minute
Supports CPUs, GPUs (T4, A10, L4, L40S, A100, H100, H200), AWS Inferentia 2, and Google TPUs
Autoscaling and scale-to-zero keep idle costs low while still handling traffic spikes
Custom inference handlers, container images, and TGI / vLLM / TEI runtimes for optimized serving
Volume-commit and annual contracts available for high-scale deployments

Limitations: Per-minute billing is strict — if your endpoint stays warm, you pay for that warmth. Cold starts on scale-to-zero can be tens of seconds for large models. Networking egress, attached storage, and cross-region replication add to the base hourly rate. Not the cheapest option for sporadic traffic — serverless Inference Providers usually wins on cost at low volume.

Production Inference

Best for: Loading, running, fine-tuning, and serving open-weight models in Python — the most-used machine learning library in the open-source ecosystem.

Provides standardized model classes for thousands of architectures — one consistent API across BERT, Llama, Whisper, CLIP, and everything in between
Latest line is Transformers v5 (v5.7.0+ stable; v5.9.0 released May 2026) — simpler model definitions, tighter PyTorch integration, broader multimodal coverage
Recent additions include Mistral 4, PaddlePaddle models, VidEoMT (video segmentation), VoxtralRealtime (streaming ASR), Jina Embeddings v3, and EuroBERT
Companion libraries: diffusers (image/video generation), peft (LoRA / QLoRA fine-tuning), accelerate (distributed training), tokenizers, datasets, evaluate, trl (RLHF/DPO)
Apache 2.0 licensed and free; the library is the gravitational center of the rest of the HF stack
Requires Python 3.10+ and PyTorch 2.4+. TensorFlow and Flax support is being deprecated in v5; PyTorch is now the only first-class path

Limitations: The library's surface area is huge — not every architecture is equally polished, and breaking changes happen at major versions (v4 to v5 required code updates). For the absolute fastest inference, dedicated runtimes like vLLM, TGI, or TensorRT-LLM outperform vanilla Transformers; Transformers is the reference implementation, not the production-optimized one.

Open-Source LibraryFree / Apache 2.0

Best for: Fine-tuning a model on your own data without writing a training loop — aimed at people who can prepare a dataset but don't want to manage GPUs, distributed training, or hyperparameter sweeps directly.

Upload a dataset, choose a base model and task, configure hyperparameters in a UI, and AutoTrain handles distributed training automatically
Supports LLM fine-tuning, text classification, token classification, sequence-to-sequence, sentence-transformer fine-tuning, vision-language model fine-tuning, image classification, and tabular tasks
The tool itself is free and open source; you only pay for the compute consumed when you run it on Hugging Face Spaces
Compute is billed per minute by hardware tier; Enterprise users on NVIDIA DGX Cloud get H100 instances at $8.25/GPU-hour and L40S at $2.75/GPU-hour
Run as a private Space with a Docker image, or use the AutoTrain Advanced library directly from your own infrastructure
Once trained, models drop into a Hub repo automatically, ready to serve via Inference Endpoints

Limitations: The "no-code" framing is real for the basics but breaks down quickly — getting strong results still requires understanding learning rates, batch sizes, and data quality. Cost can sneak up on you because compute is the meter, and large LLM fine-tuning on H100s adds up fast. Less control than writing your own training script with transformers + peft.

Training PlatformPay for Compute

Best for: Active builders who hit free-tier limits — researchers, indie hackers, and developers who use the Hub as their day-to-day workspace.

$9/month per user (annual or monthly billing)
20x included Inference Provider credits (2M/month vs. 100K on free)
8x ZeroGPU quota (up to 25 minutes of NVIDIA H200 compute/day) and highest queue priority
Up to 10 ZeroGPU Spaces with Dev Mode (SSH/VS Code into the running Space)
1TB private storage (10x the free tier) and 10TB public storage
Dataset Viewer on private datasets, articles and posts visible on your profile, early access to new features

Limitations: Pro is per-user; for teams that need shared organizations, SSO, or audit trails, you want Team or Enterprise instead. The included credits and quota are generous but not infinite — high-volume inference still ends up on Endpoints or BYOK partner providers.

Individual Plan

Best for: Companies and labs that need shared organizations, security controls, compliance posture, and the ability to host private models alongside the public ecosystem.

Team ($20/user/month) — everything in Pro plus shared organization, basic SSO, and collaboration tools
Enterprise Hub (from $50/user/month, with custom pricing for larger orgs) — SAML SSO, SCIM, audit logs, resource groups, fine-grained access control, and Storage Regions for data residency
1TB private storage per seat pooled across the org (40 seats = 40TB included)
Bring Your Own Cloud (BYOC) deploys the Hub experience inside your own AWS / Azure / GCP account so model weights and data never leave your network
Pool of free usage credits scaled to seat count, plus prioritized support from the Hugging Face team
Compliance posture (SOC 2, GDPR, HIPAA paths) and DPA / BAA available for regulated workloads

Limitations: Per-seat pricing is steep relative to building on raw cloud infrastructure — the value comes from the ecosystem, governance, and developer ergonomics, not the hosting itself. BYOC requires real cloud-engineering effort to set up and maintain. For very small teams, Pro accounts in a personal organization can cover most needs.

Team / EnterprisePer-Seat

Best for: Understanding why Hugging Face is the gravitational center of open-source AI — the role it plays in the ecosystem isn't fully captured by any single product line.

Releases of new open models from Meta (Llama), Mistral, Google (Gemma), Microsoft (Phi), Alibaba (Qwen), DeepSeek, and others land on the Hub first — often the same hour as the announcement
Community standards: model cards, dataset cards, evaluation harnesses, and the leaderboards (Open LLM Leaderboard, Big Code Models, Chatbot Arena mirrors) shape how the field measures progress
Maintains the libraries the rest of the ecosystem builds on: transformers, diffusers, tokenizers, datasets, accelerate, peft, trl, safetensors, candle, text-generation-inference
Partnerships with cloud providers (AWS, Azure, GCP, Cloudflare, Scaleway, Together) make the Hub the default distribution layer for open weights
Educational content: free courses on NLP, deep RL, audio, and computer vision; the Hugging Face Cookbook for production patterns
Sponsors and hosts open-source research artifacts (BLOOM, StarCoder, IDEFICS, SmolLM, etc.) so independent labs can ship without their own infrastructure

Limitations: Even teams that ultimately serve models on Bedrock, Vertex, Azure, or their own infrastructure tend to discover, evaluate, and fine-tune on Hugging Face. Treat the Hub as a default piece of any open-source AI workflow, not just a vendor decision.

Ecosystem RoleOpen Source

Capability	Free	Pro ($9/mo)	Team ($20/user/mo)	Enterprise (from $50/user/mo)	Inference Endpoints
Public model / dataset hosting	Unlimited	Unlimited	Unlimited	Unlimited	N/A
Private storage	100GB	1TB	1TB / seat	1TB / seat (pooled)	N/A
ZeroGPU quota (H200)	Free, baseline	8x baseline	8x baseline	8x + pooled credits	Use your own GPU
Inference Provider credits / month	100K	2M	Pooled	Pooled + credits	Per-minute compute
Spaces with Dev Mode (SSH/VS Code)	No	Up to 10	Yes	Yes	N/A
SSO (SAML / OIDC)	No	No	Basic	SAML + SCIM	Yes
Audit logs & resource groups	No	No	Limited	Full	Per-endpoint
Storage Regions / data residency	No	No	No	Yes	Region-pinned
Bring Your Own Cloud (BYOC)	No	No	No	Yes	AWS / Azure / GCP
Production-grade SLA inference	No	Best-effort	Best-effort	Via Endpoints	Yes
Autoscale / scale-to-zero	N/A	N/A	N/A	N/A	Yes
Cost model	$0	Flat $9/user/mo	Flat $20/user/mo	Per-seat + credits	Per-minute compute
Best for	Learning, OSS	Indie / researcher	Small team	Regulated org	Production traffic

How to think about cost: The Hub itself is free for public use forever — you only pay when you need private storage, premium quota, dedicated compute, or governance features. Most individuals can stay on free or Pro indefinitely. Teams cross into Team / Enterprise when they need shared organizations or compliance, not because they've outgrown the free tier on volume. Inference Endpoints is a separate, consumption-based bill that runs alongside whichever Hub plan you're on — it's the production runtime, not a license tier.

Hugging Face

Model Hub

Datasets

Spaces

Inference Providers

Inference Endpoints

Transformers

AutoTrain

Hugging Face Pro

Team & Enterprise Hub

Open-Source Ecosystem

Our Recommendation