NeMo AutoModel is an NVIDIA-developed library that delivers a high-performance, easy-to-use solution for fine-tuning and pretraining large language models (LLMs) and vision-language models (VLMs) directly from the Hugging Face Hub. It provides true Day-0 compatibility with any Hugging Face model, so you can start using models immediately without conversion or setup delays.
Start fine-tuning models instantly, scale effortlessly with PyTorch-native data/model parallelism, optimized custom kernels, and memory-efficient recipes-all while preserving the original checkpoint format for seamless use across the Hugging Face ecosystem.
β οΈ Note: NeMo AutoModel is under active development. New features, improvements, and documentation updates are released regularly. We are working toward a stable release, so expect the interface to solidify over time. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.
NeMo AutoModel provides native support for a wide range of models available on the Hugging Face Hub, enabling efficient fine-tuning for various domains.
- LLaMA Family: LLaMA 3, LLaMA 3.1, LLaMA 3.2, Code Llama
- QWen Family: QWen3, QWen2.5, Qwen2
- Gemma Family: Gemma2, Gemma3
- Phi Family: Phi2, Phi3, Phi4
- And more: Any causal LM on Hugging Face Hub!
- Qwen2.5-VL: All variants (3B, 7B, 72B)
- Gemma-3-VL: 3B and other variants
To get started quickly, NeMo AutoModel provides a collection of ready-to-use recipes for common LLM and VLM fine-tuning tasks. Simply select the recipe that matches your model and training setup (e.g., single-GPU, multi-GPU, or multi-node).
Domain | Model ID | Single-GPU | Single-Node | Multi-Node |
---|---|---|---|---|
LLM | meta-llama/Llama-3.2-1B |
HellaSwag + LoRA | β’HellaSwag β’SQuAD |
HellaSwag + nvFSDP |
VLM | google/gemma-3-4b-it |
CORD-v2 + LoRA | CORD-v2 | Coming Soon |
To run a NeMo AutoModel recipe, you need a recipe script (e.g., LLM, VLM) and a YAML config file (e.g., LLM, VLM):
# Command invocation format:
uv run <recipe_script_path> --config <yaml_config_path>
# LLM example: multi-GPU with FSDP2
uv run torchrun --nproc-per-node=8 recipes/llm/finetune.py --config recipes/llm/llama_3_2_1b_hellaswag.yaml
# VLM example: single GPU fine-tuning (Gemma-3-VL) with LoRA
uv run recipes/vlm/finetune.py --config recipes/vlm/gemma_3_vl_3b_cord_v2_peft.yaml
- Day-0 Hugging Face Support: Instantly fine-tune any model from the Hugging Face Hub
- Lightning Fast Performance: Custom CUDA kernels and memory optimizations deliver 2β5Γ speedups
- Large-Scale Distributed Training: Built-in FSDP2 and nvFSDP for seamless multi-node scaling
- Vision-Language Model Ready: Native support for VLMs (Qwen2-VL, Gemma-3-VL, etc)
- Advanced PEFT Methods: LoRA and extensible PEFT system out of the box
- Seamless HF Ecosystem: Fine-tuned models work perfectly with Transformers pipeline, VLM, etc.
- Robust Infrastructure: Distributed checkpointing with integrated logging and monitoring
- Optimized Recipes: Pre-built configurations for common models and datasets
- Flexible Configuration: YAML-based configuration system for reproducible experiments
- FP8 Precision: Native FP8 training & inference for higher throughput and lower memory use
- INT4 / INT8 Quantization: Turn-key quantization workflows for ultra-compact, low-memory training
NeMo AutoModel is offered both as a standard Python package installable via pip and as a ready-to-run NeMo Framework Docker container.
# We use `uv` for package management and environment isolation.
pip3 install uv
# If you cannot install at the system level, you can install for your user with
# pip3 install --user uv
Run every command with uv run
. It auto-installs the virtual environment from the lock file and keeps it up to date, so you never need to activate a venv manually. Example: uv run recipes/llm/finetune.py
. If you prefer to install NeMo Automodel explicitly, please follow the instructions below.
# Install the latest stable release from PyPI
# We first need to initialize the virtual environment using uv
uv venv
uv pip install nemo_automodel # or: uv pip install --upgrade nemo_automodel
# Install the latest NeMo Automodel from the GitHub repo (best for development).
# We first need to initialize the virtual environment using uv
uv venv
# We can now install from source
uv pip install git+https://github.com/NVIDIA-NeMo/Automodel.git
uv run python -c "import nemo_automodel; print('β
NeMo AutoModel ready')"
distributed:
_target_: nemo_automodel.distributed.nvfsdp.NVFSDPManager
dp_size: 8
tp_size: 1
cp_size: 1
peft:
peft_fn: nemo_automodel._peft.lora.apply_lora_to_linear_modules
match_all_linear: True
dim: 8
alpha: 32
use_triton: True
model:
_target_: nemo_automodel._transformers.NeMoAutoModelForImageTextToText.from_pretrained
pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
processor:
_target_: transformers.AutoProcessor.from_pretrained
pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
min_pixels: 200704
max_pixels: 1003520
checkpoint:
enabled: true
checkpoint_dir: ./checkpoints
save_consolidated: true # HF-compatible safetensors
model_save_format: safetensors
NeMo-Automodel/
βββ nemo_automodel/ # Core library
β βββ _peft/ # PEFT implementations (LoRA)
β βββ _transformers/ # HF model integrations
β βββ checkpoint/ # Distributed checkpointing
β βββ datasets/ # Dataset loaders
β β βββ llm/ # LLM datasets (HellaSwag, SQuAD, etc.)
β β βββ vlm/ # VLM datasets (CORD-v2, rdr etc.)
β βββ distributed/ # FSDP2, nvFSDP, parallelization
β βββ loss/ # Optimized loss functions
β βββ training/ # Training recipes and utilities
βββ recipes/ # Ready-to-use training recipes
β βββ llm/ # LLM fine-tuning recipes
β βββ vlm/ # VLM fine-tuning recipes
βββ tests/ # Comprehensive test suite
We welcome contributions! Please see our Contributing Guide for details.
NVIDIA NeMo AutoModel is licensed under the Apache License 2.0.
- Documentation: https://docs.nvidia.com/nemo-framework/user-guide/latest/automodel/index.html
- Hugging Face Hub: https://huggingface.co/models
- Issues: https://github.com/NVIDIA-NeMo/Automodel/issues
- Discussions: https://github.com/NVIDIA-NeMo/Automodel/discussions
Made with β€οΈ by NVIDIA
Accelerating AI for everyone