🚀 ERNIE 4.5: The Developer's Resource Guide 🤖

Welcome to the developer resource guide for ERNIE 4.5, a powerful family of open-source models from Baidu. This guide provides all the essential information, links, and code examples to help you get started with deploying ERNIE 4.5 models.

🔗 Quick Links

Resource	URL
📝 Blog	https://yiyan.baidu.com/blog
📄 Technical Report	https://yiyan.baidu.com/blog/publication
🤗 Hugging Face	https://huggingface.co/baidu
🔧 ERNIEKit	https://github.com/PaddlePaddle/ERNIE
⚡ FastDeploy	https://www.modelscope.cn/studios/PaddlePaddle
💡 Baidu AI Studio	https://aistudio.baidu.com/
🔅 ModelScope	https://www.modelscope.cn/studios/PaddlePaddle

📦 Open Source Models

ERNIE 4.5 is available under the Apache 2.0 License. The open-source release includes 10 models across 3 series, along with code for pre-training, fine-tuning, and inference deployment.

Series	Activated Parameters	Model Name Suffix	Description
0.3B Series	~300 Million	`-0.3B`	Lightweight models suitable for local and on-device deployment.
A3B Series	~3 Billion	`-A3B`	Efficient models offering a balance of performance and resource usage.
A47B Series	~47 Billion	`-A47B`	State-of-the-art models for maximum performance on complex tasks.

🏷️ Naming Conventions:

-Base: The foundational pre-trained model.
(no suffix): The instruction-tuned chat model.
-VL: The Vision-Language multimodal model.
Hybrid Thinking: The VL model features a "thinking mode" (controlled by a parameter) that enhances reasoning, alongside a standard non-thinking mode for fast perception.

👩‍💻 Getting Started: Running ERNIE 4.5 Locally

You can run the lightweight ERNIE 4.5 models on your local machine. Below are examples using llama.cpp for general CPU inference and MNN for optimized on-device deployment.

🍎 Example 1: Running with `llama.cpp` (for ERNIE-4.5-0.3B)

The llama.cpp project supports the ERNIE 4.5 0.3B models, allowing you to run them efficiently on a CPU.

Step 1️⃣: Clone and Build llama.cpp First, get the latest version of llama.cpp which includes support for ERNIE 4.5.

# Clone the repository
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

# Build the project
mkdir build  
cd build
cmake ..
make

Step 2️⃣: Download the ERNIE 4.5 GGUF Model download the .gguf file.

# Install huggingface_hub
pip install -U huggingface_hub
huggingface-cli download --resume-download unsloth/ERNIE-4.5-0.3B-PT-GGUF --local-dir path/to/dir

# If timeout,use 
export HF_ENDPOINT=https://hf-mirror.com

Step 3️⃣: Run Inference Use the main executable from llama.cpp to run the model.

# Run the model in interactive mode
cd llama.cpp/build/bin
./llama-cli -m /path/to/dir/ERNIE-4.5-0.3B-PT.gguf --jinja -p "Hello, who are you?" -n 128

-m: Specifies the path to your GGUF model file.
-p: Provides an initial prompt.
-n: Sets the number of tokens to generate.

🍏 Example 2: Running with MNN (for ERNIE-4.5-0.3B-PT-MNN)

Reference project: https://huggingface.co/taobao-mnn/ERNIE-4.5-0.3B-PT-MNN, welcome to visit the original author link

MNN is a highly efficient deep learning inference engine, perfect for edge and mobile devices. A 4-bit quantized version of ERNIE 4.5 is available specifically for MNN.

Step 1️⃣: Download the MNN Model You can download the model from Hugging Face or ModelScope.

# Install Hugging Face Hub
pip install -U huggingface_hub

# Download the model files
# shell download
huggingface-cli download --resume-download taobao-mnn/ERNIE-4.5-0.3B-PT-MNN --local-dir path/to/dir

# If timeout,use 
export HF_ENDPOINT=https://hf-mirror.com

# SDK download
from huggingface_hub import snapshot_download
model_dir = snapshot_download('taobao-mnn/ERNIE-4.5-0.3B-PT-MNN')

# git clone
git clone https://www.modelscope.cn/MNN/ERNIE-4.5-0.3B-PT-MNN

Step 2️⃣: Clone and Compile MNN You need to compile the MNN engine from the source with the correct flags to enable LLM support.

# Clone the MNN repository
git clone https://github.com/alibaba/MNN.git
cd MNN

# Create build directory and compile
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true
make -j

Step 3️⃣: Run the Demo Use the llm_demo application to run the model.

# Run the MNN demo
./llm_demo /path/to/ERNIE-4.5-0.3B-PT-MNN/config.json prompt.txt

🍊 Example 3: Running with mlx (for ERNIE-4.5-0.3B-PT-bf16)

Reference project: https://huggingface.co/mlx-community/ERNIE-4.5-0.3B-PT-bf16, welcome to visit the original author link

MLX LM is a Python package for generating text and fine-tuning large language models on Apple silicon with MLX.

This model mlx-community/ERNIE-4.5-0.3B-PT-bf16 was converted to MLX format from baidu/ERNIE-4.5-0.3B-PT using mlx-lm version 0.25.2.

Step 1️⃣: Download the mlx Model

# Install Hugging Face Hub
pip install -U huggingface_hub

# Download the model files
# shell download
huggingface-cli download --resume-download mlx-community/ERNIE-4.5-0.3B-PT-bf16 --local-dir path/to/dir

# If timeout,use 
export HF_ENDPOINT=https://hf-mirror.com

Step 2️⃣: Use with mlx

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/ERNIE-4.5-0.3B-PT-bf16")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

🌍 Developer Ecosystem and Tools

🛠️ Official Toolkits (PaddlePaddle Based)

ERNIEKit: An industrial-grade toolkit for the full development lifecycle of ERNIE models. It supports high-performance pre-training, SFT, DPO, LoRA, and quantization (QAT/PTQ).
FastDeploy: A production-ready inference and deployment toolkit. It features advanced acceleration (speculative decoding, MTP), comprehensive quantization support, and compatibility with numerous hardware backends (NVIDIA, Kunlunxin, Ascend, etc.).

🤝 Friends of OSS Projects (Third-Party Integrations)

ERNIE 4.5 is being actively integrated into the wider open-source ecosystem. Here is the current status of support in popular projects:

Project	Status
transformers	✅ Merged 🎉 ! Ernie 0.3B and MoE models are now integrated! Directly usable. ⚙️ (Repo)(Merged PR #39228) ⏳ In Progress: Ernie 4.5 VL models #39585 (Draft)
vLLM	✅ Merged 🎉 ! Native support for ERNIE 4.5 text models is now available in the main branch. (Merged PR #20220) ✅ Merged 🎉 ! Added ERNIE 4.5 VL Model Support (Merged PR #22514) ⏳ Open: Enable EPLB on ernie4.5-moe (PR #22100)
sglang	✅ Merged 🎉 ! ERNIE 4.5 is now supported in sglang, enabling streamlined usage in structured generation and multi-agent orchestration scenarios. (Merged PR #7657)
llama.cpp/ollama	✅ Merged 🎉 ! 0.3B models and Ernie4.5 MoE are already supported in `llama.cpp` — efficient local CPU inference available. (PR #6926)(PR #14746)
ms-swift	✅ Merged 🎉 ! Support for ERNIE 4.5 has been integrated, enabling streamlined fine-tuning and inference within the ModelScope ecosystem. (Merged PR #4757) ⏳ Open: Add ERNIE VL support (PR #4763)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run-ernie-4.5-21b-a3b-on-colab.ipynb		run-ernie-4.5-21b-a3b-on-colab.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 ERNIE 4.5: The Developer's Resource Guide 🤖

🔗 Quick Links

📦 Open Source Models

👩‍💻 Getting Started: Running ERNIE 4.5 Locally

🍎 Example 1: Running with `llama.cpp` (for ERNIE-4.5-0.3B)

🍏 Example 2: Running with MNN (for ERNIE-4.5-0.3B-PT-MNN)

🍊 Example 3: Running with mlx (for ERNIE-4.5-0.3B-PT-bf16)

🌍 Developer Ecosystem and Tools

🛠️ Official Toolkits (PaddlePaddle Based)

🤝 Friends of OSS Projects (Third-Party Integrations)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

PFCCLab/ERNIE4.5-Developer-Resource

Folders and files

Latest commit

History

Repository files navigation

🚀 ERNIE 4.5: The Developer's Resource Guide 🤖

🔗 Quick Links

📦 Open Source Models

👩‍💻 Getting Started: Running ERNIE 4.5 Locally

🍎 Example 1: Running with llama.cpp (for ERNIE-4.5-0.3B)

🍏 Example 2: Running with MNN (for ERNIE-4.5-0.3B-PT-MNN)

🍊 Example 3: Running with mlx (for ERNIE-4.5-0.3B-PT-bf16)

🌍 Developer Ecosystem and Tools

🛠️ Official Toolkits (PaddlePaddle Based)

🤝 Friends of OSS Projects (Third-Party Integrations)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

🍎 Example 1: Running with `llama.cpp` (for ERNIE-4.5-0.3B)

Packages