A simple Python example to run LFM2 with
llama.cpp using the llama-cpp-python
bindings.
This script demonstrates how to create a chat loop with system prompts and persistent conversation history.
Clone this repo and install the dependencies:
git clone https://github.com/Liquid4All/lfm-llamacpp-py.git
cd lfm-llamacpp-py
Download a suitable GGUF model from the LFM2 1.2B GGUF HuggingFace repo and place it in the ./models/
directory:
β οΈ Important: If you clone the HuggingFace repository usinggit clone
, you might only get Git LFS pointer files (small 134-byte files) instead of the actual model files. We recommend downloading the GGUF files directly from the HuggingFace web interface to avoid this issue.
Example:
models/
βββ LFM2-1.2B-Q4_0.gguf
Lastly, update the MODEL_GGUF_FILE
variable in main.py
:
MODEL_GGUF_FILE = "./models/your-chosen-model.gguf"
Run the chat script:
uv run main.py
Example interaction:
$ uv run main.py β±
Model loaded: ./models/LFM2-1.2B-Q4_0.gguf
Use Ctrl+C to exit.
System: You're are helpful assistant. Be concise and accurate.
User: How many planets are there in the solar system?
Assistant: There are eight planets in the solar system: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.
User: What about Pluto?
Assistant: Pluto was reclassified as a dwarf planet by the International Astronomical Union in 2006. However, some still consider it a planet.
User: What is the capital of France?
Assistant: The capital of France is Paris.
- Use Ctrl+C to exit
- Stderr logs from llama.cpp are saved to
llamacpp.log
You can modify the default system prompt by editing this line in main.py
:
SYSTEM_PROMPT = "You're a helpful assistant. Be concise and accurate."