Skip to content

Add Llama 3.2 to pkg #591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/basic-tests-linux-uv.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,5 @@ jobs:
shell: bash
run: |
source .venv/bin/activate
uv pip install transformers
pytest pkg/llms_from_scratch/tests/
4 changes: 0 additions & 4 deletions .github/workflows/check-links.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ jobs:
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
uv add pytest-ruff pytest-check-links
# Current version of retry doesn't work well if there are broken non-URL links
# pip install pytest pytest-check-links pytest-retry

- name: Check links
run: |
Expand All @@ -40,5 +38,3 @@ jobs:
--check-links-ignore "https://arxiv.org/*" \
--check-links-ignore "https://ai.stanford.edu/~amaas/data/sentiment/" \
--check-links-ignore "https://x.com/*"
# pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://arena.lmsys.org" --retries 2 --retry-delay 5

186 changes: 185 additions & 1 deletion ch05/07_gpt_to_llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,188 @@ This folder contains code for converting the GPT implementation from chapter 4 a
- [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb): contains code to convert the Llama 2 model to Llama 3, Llama 3.1, and Llama 3.2
- [standalone-llama32.ipynb](standalone-llama32.ipynb): a standalone notebook implementing Llama 3.2

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">


&nbsp;
### Using Llama 3.2 via the `llms-from-scratch` package

For an easy way to use the Llama 3.2 1B and 3B models, you can also use the `llms-from-scratch` PyPI package based on the source code in this repository at [pkg/llms_from_scratch](../../pkg/llms_from_scratch).

&nbsp;
##### 1) Installation

```bash
pip install llms_from_scratch blobfile
```
&nbsp;
##### 2) Model and text generation settings

Specify which model to use:

```python
MODEL_FILE = "llama3.2-1B-instruct.pth"
# MODEL_FILE = "llama3.2-1B-base.pth"
# MODEL_FILE = "llama3.2-3B-instruct.pth"
# MODEL_FILE = "llama3.2-3B-base.pth"
```

Basic text generation settings that can be defined by the user. Note that the recommended 8192-token context size requires approximately 3 GB of VRAM for the text generation example.

```python
MODEL_CONTEXT_LENGTH = 8192 # Supports up to 131_072

# Text generation settings
if "instruct" in MODEL_FILE:
PROMPT = "What do llamas eat?"
else:
PROMPT = "Llamas eat"

MAX_NEW_TOKENS = 150
TEMPERATURE = 0.
TOP_K = 1
```

&nbsp;
##### 3) Weight download and loading

This automatically downloads the weight file based on the model choice above:

```python
import os
import urllib.request

url = f"https://huggingface.co/rasbt/llama-3.2-from-scratch/resolve/main/{MODEL_FILE}"

if not os.path.exists(MODEL_FILE):
urllib.request.urlretrieve(url, MODEL_FILE)
print(f"Downloaded to {MODEL_FILE}")
```

The model weights are then loaded as follows:

```python
import torch
from llms_from_scratch.llama3 import Llama3Model

if "1B" in MODEL_FILE:
from llms_from_scratch.llama3 import LLAMA32_CONFIG_1B as LLAMA32_CONFIG
elif "3B" in MODEL_FILE:
from llms_from_scratch.llama3 import LLAMA32_CONFIG_3B as LLAMA32_CONFIG
else:
raise ValueError("Incorrect model file name")

LLAMA32_CONFIG["context_length"] = MODEL_CONTEXT_LENGTH

model = Llama3Model(LLAMA32_CONFIG)
model.load_state_dict(torch.load(MODEL_FILE, weights_only=True))

device = (
torch.device("cuda") if torch.cuda.is_available() else
torch.device("mps") if torch.backends.mps.is_available() else
torch.device("cpu")
)
model.to(device)
```

&nbsp;
##### 4) Initialize tokenizer

The following code downloads and initializes the tokenizer:

```python
from llms_from_scratch.llama3 import Llama3Tokenizer, ChatFormat, clean_text

TOKENIZER_FILE = "tokenizer.model"

url = f"https://huggingface.co/rasbt/llama-3.2-from-scratch/resolve/main/{TOKENIZER_FILE}"

if not os.path.exists(TOKENIZER_FILE):
urllib.request.urlretrieve(url, TOKENIZER_FILE)
print(f"Downloaded to {TOKENIZER_FILE}")

tokenizer = Llama3Tokenizer("tokenizer.model")

if "instruct" in MODEL_FILE:
tokenizer = ChatFormat(tokenizer)
```

&nbsp;
##### 5) Generating text

Lastly, we can generate text via the following code:

```python
import time

from llms_from_scratch.ch05 import (
generate,
text_to_token_ids,
token_ids_to_text
)

torch.manual_seed(123)

start = time.time()

token_ids = generate(
model=model,
idx=text_to_token_ids(PROMPT, tokenizer).to(device),
max_new_tokens=MAX_NEW_TOKENS,
context_size=LLAMA32_CONFIG["context_length"],
top_k=TOP_K,
temperature=TEMPERATURE
)

print(f"Time: {time.time() - start:.2f} sec")

if torch.cuda.is_available():
max_mem_bytes = torch.cuda.max_memory_allocated()
max_mem_gb = max_mem_bytes / (1024 ** 3)
print(f"Max memory allocated: {max_mem_gb:.2f} GB")

output_text = token_ids_to_text(token_ids, tokenizer)

if "instruct" in MODEL_FILE:
output_text = clean_text(output_text)

print("\n\nOutput text:\n\n", output_text)
```

When using the Llama 3.2 1B Instruct model, the output should look similar to the one shown below:

```
Time: 4.12 sec
Max memory allocated: 2.91 GB


Output text:

Llamas are herbivores, which means they primarily eat plants. Their diet consists mainly of:

1. Grasses: Llamas love to graze on various types of grasses, including tall grasses and grassy meadows.
2. Hay: Llamas also eat hay, which is a dry, compressed form of grass or other plants.
3. Alfalfa: Alfalfa is a legume that is commonly used as a hay substitute in llama feed.
4. Other plants: Llamas will also eat other plants, such as clover, dandelions, and wild grasses.

It's worth noting that the specific diet of llamas can vary depending on factors such as the breed,
```

&nbsp;
**Pro tip**

For up to a 4× speed-up, replace

```python
model.to(device)
```

with

```python
model = torch.compile(model)
model.to(device)
```

Note: the speed-up takes effect after the first `generate` call.

8 changes: 8 additions & 0 deletions pkg/llms_from_scratch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,5 +109,13 @@ from llms_from_scratch.ch07 import (
from llms_from_scratch.appendix_a import NeuralNetwork, ToyDataset

from llms_from_scratch.appendix_d import find_highest_gradient, train_model

from llms_from_scratch.llama3 import (
Llama3Model,
Llama3Tokenizer,
ChatFormat,
clean_text
)
```

(For the `llms_from_scratch.llama3` usage information, please see [this bonus section](../../ch05/07_gpt_to_llama/README.md).
Loading