Skip to content

Commit 9ee1b25

Browse files
authored
Merge branch 'master' into onnx_parser
2 parents 3b50b6f + 6af4b02 commit 9ee1b25

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1924
-1045
lines changed

.github/workflows/benchmark.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,7 @@ jobs:
617617
run: |
618618
# generate quantized weights
619619
ln -s /data/home/tiny/tinygrad/extra/datasets/imagenet extra/datasets/imagenet
620-
ln -s /data/home/tiny/tinygrad/testsig-0x858d6c15.so .
620+
ln -s /data/home/tiny/tinygrad/testsig-*.so .
621621
PYTHONPATH=. CC=clang-19 CPU=1 QUANT=1 CNT=0 python3 examples/test_onnx_imagenet.py https://github.com/xamcat/mobcat-samples/raw/refs/heads/master/onnx_runtime/InferencingSample/InferencingSample/mobilenetv2-7.onnx /tmp/model.quant.onnx
622622
# benchmark on DSP with NOOPT=1, the devectorizer has issues
623623
PYTHONPATH=. CC=clang-19 DSP=1 DONT_REALIZE_EXPAND=1 NOOPT=1 CNT=2 DEBUG=2 python3 examples/test_onnx_imagenet.py /tmp/model.quant.onnx

.github/workflows/mlperf.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ jobs:
2222
ln -s /raid/datasets/imagenet extra/datasets/imagenet
2323
- name: Run resnet
2424
run: |
25-
BENCHMARK_LOG=mlpert_train_resnet LOGMLPERF=0 examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/resnet/implementations/tinybox_red/run_and_time.sh
25+
BENCHMARK_LOG=mlpert_train_resnet LOGMLPERF=0 examples/mlperf/training_submission_v5.1/tinycorp/benchmarks/resnet/implementations/tinybox_red/run_and_time.sh

autogen_stubs.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ generate_libc() {
225225

226226
sed -i "s\import ctypes\import ctypes, ctypes.util, os\g" $BASE/libc.py
227227
sed -i "s\FIXME_STUB\libc\g" $BASE/libc.py
228-
sed -i "s\FunctionFactoryStub()\None if (libc_path := ctypes.util.find_library('c')) is None else ctypes.CDLL(libc_path)\g" $BASE/libc.py
228+
sed -i "s\FunctionFactoryStub()\None if (libc_path := ctypes.util.find_library('c')) is None else ctypes.CDLL(libc_path, use_errno=True)\g" $BASE/libc.py
229229

230230
fixup $BASE/libc.py
231231
}

docs/abstractions2.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,11 +51,11 @@
5151
# describe the computation
5252
buf_1 = UOp(Ops.DEFINE_GLOBAL, dtypes.int32.ptr(), (), 1)
5353
buf_2 = UOp(Ops.DEFINE_GLOBAL, dtypes.int32.ptr(), (), 2)
54-
ld_1 = UOp(Ops.LOAD, dtypes.int32, (buf_1, ShapeTracker.from_shape((1,)).to_uop()))
55-
ld_2 = UOp(Ops.LOAD, dtypes.int32, (buf_2, ShapeTracker.from_shape((1,)).to_uop()))
54+
ld_1 = UOp(Ops.LOAD, dtypes.int32, (buf_1.view(ShapeTracker.from_shape((1,))),))
55+
ld_2 = UOp(Ops.LOAD, dtypes.int32, (buf_2.view(ShapeTracker.from_shape((1,))),))
5656
alu = ld_1 + ld_2
5757
output_buf = UOp(Ops.DEFINE_GLOBAL, dtypes.int32.ptr(), (), 0)
58-
st_0 = UOp(Ops.STORE, dtypes.void, (output_buf, ShapeTracker.from_shape((1,)).to_uop(), alu))
58+
st_0 = UOp(Ops.STORE, dtypes.void, (output_buf.view(ShapeTracker.from_shape((1,))), alu))
5959
s = UOp(Ops.SINK, dtypes.void, (st_0,))
6060

6161
# convert the computation to a "linearized" format (print the format)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/bin/bash
2+
3+
export PYTHONPATH="." AMD=1
4+
export MODEL="bert"
5+
export DEFAULT_FLOAT="HALF" GPUS=1 BS=128 EVAL_BS=128
6+
7+
export BEAM=3 BEAM_UOPS_MAX=4000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
8+
export IGNORE_JIT_FIRST_BEAM=1
9+
# export BEAM_LOG_SURPASS_MAX=1
10+
# export BASEDIR="/raid/datasets/wiki"
11+
12+
export RESET_STEP=1
13+
export BENCHMARK=10 BERT_LAYERS=2 DEBUG=2
14+
15+
python3 examples/mlperf/model_train.py
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# 1. Problem
2+
3+
This problem uses BERT for NLP.
4+
5+
## Requirements
6+
7+
Install tinygrad and mlperf-logging (uncomment mlperf from setup.py) from branch mlperf_training_v5.0.
8+
```
9+
git clone https://github.com/tinygrad/tinygrad.git
10+
python3 -m pip install -e ".[mlperf]"
11+
```
12+
Also install gdown (for dataset), numpy, tqdm and tensorflow.
13+
```
14+
pip install gdown numpy tqdm tensorflow
15+
```
16+
17+
### tinybox_green
18+
Install the p2p driver per [README](https://github.com/tinygrad/open-gpu-kernel-modules/blob/550.54.15-p2p/README.md)
19+
This is the default on production tinybox green.
20+
21+
# 2. Directions
22+
23+
## Steps to download and verify data
24+
25+
### 1. Download raw data
26+
27+
```
28+
BASEDIR="/raid/datasets/wiki" WIKI_TRAIN=1 VERIFY_CHECKSUM=1 python3 extra/datasets/wikipedia_download.py
29+
```
30+
31+
### 2. Preprocess train and validation data
32+
33+
Note: The number of threads used for preprocessing is limited by available memory. With 128GB of RAM, a maximum of 16 threads is recommended.
34+
35+
#### Training:
36+
```
37+
BASEDIR="/raid/datasets/wiki" NUM_WORKERS=16 python3 extra/datasets/wikipedia.py pre-train all
38+
```
39+
40+
Generating a specific topic (Between 0 and 499)
41+
```
42+
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-train 42
43+
```
44+
45+
#### Validation:
46+
```
47+
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-eval
48+
```
49+
## Running
50+
51+
### tinybox_green
52+
53+
#### Steps to run benchmark
54+
```
55+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/run_and_time.sh
56+
```
57+
58+
### tinybox_red
59+
60+
#### Steps to run benchmark
61+
```
62+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/run_and_time.sh
63+
```
64+
### tinybox_8xMI300X
65+
66+
#### Steps to run benchmark
67+
```
68+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_8xMI300X/run_and_time.sh
69+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
3+
export PYTHONPATH="." AMD=1
4+
export MODEL="bert"
5+
export DEFAULT_FLOAT="HALF" GPUS=8 BS=1024 EVAL_BS=1024
6+
export OPT_BASE_LEARNING_RATE=0.0011 OPT_LAMB_BETA_1=0.60466 OPT_LAMB_BETA_2=0.85437 DECAY=0.1
7+
8+
export BEAM=3 BEAM_UOPS_MAX=6000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
9+
export IGNORE_JIT_FIRST_BEAM=1 FREE_INTERMEDIATE=0
10+
export BASEDIR="/raid/datasets/wiki"
11+
12+
export BENCHMARK=10 BERT_LAYERS=2 DEBUG=2
13+
14+
python3 examples/mlperf/model_train.py
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/bin/bash
2+
3+
export PYTHONPATH="." AMD=1
4+
export MODEL="bert"
5+
export DEFAULT_FLOAT="HALF" GPUS=8 BS=1024 EVAL_BS=1024
6+
7+
# similar to https://github.com/mlcommons/training_results_v3.1/blob/d06288b2bd675a9d88e0e6181f5bb5626b71ec19/Quanta_Cloud_Technology/results/D54U-3U/bert/result_1.txt#L54
8+
export OPT_BASE_LEARNING_RATE=0.0011 OPT_LAMB_BETA_1=0.60466 OPT_LAMB_BETA_2=0.85437 DECAY=0.1
9+
export TRAIN_STEPS=3900
10+
11+
export BEAM=3 BEAM_UOPS_MAX=6000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
12+
export IGNORE_JIT_FIRST_BEAM=1 FREE_INTERMEDIATE=0
13+
export BASEDIR="/raid/datasets/wiki"
14+
15+
export WANDB=1 PARALLEL=0
16+
17+
RUNMLPERF=1 python3 examples/mlperf/model_train.py
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#!/bin/bash
2+
set -e # Exit on any error
3+
set -o pipefail # Make pipeline fail if any command fails
4+
5+
export PYTHONPATH="." AMD=1
6+
export MODEL="bert"
7+
export SUBMISSION_PLATFORM="tinybox_8xMI300X"
8+
export DEFAULT_FLOAT="HALF" GPUS=8 BS=1024 EVAL_BS=1024
9+
10+
# similar to https://github.com/mlcommons/training_results_v3.1/blob/d06288b2bd675a9d88e0e6181f5bb5626b71ec19/Quanta_Cloud_Technology/results/D54U-3U/bert/result_1.txt#L54
11+
export OPT_BASE_LEARNING_RATE=0.0011 OPT_LAMB_BETA_1=0.60466 OPT_LAMB_BETA_2=0.85437 DECAY=0.1
12+
export TRAIN_STEPS=3900
13+
14+
export BEAM=3 BEAM_UOPS_MAX=6000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
15+
export IGNORE_JIT_FIRST_BEAM=1 FREE_INTERMEDIATE=0
16+
export BASEDIR="/raid/datasets/wiki"
17+
18+
# pip install -e ".[mlperf]"
19+
export LOGMLPERF=1
20+
21+
export SEED=$RANDOM
22+
DATETIME=$(date "+%m%d%H%M")
23+
LOGFILE="bert_8xMI300x_${DATETIME}_${SEED}.log"
24+
25+
# init # TODO: without DEBUG=2 it hangs
26+
BENCHMARK=10 INITMLPERF=1 BERT_LAYERS=2 DEBUG=2 python3 examples/mlperf/model_train.py | tee $LOGFILE
27+
28+
# run
29+
PARALLEL=0 RUNMLPERF=1 python3 examples/mlperf/model_train.py | tee -a $LOGFILE
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# 1. Problem
2+
3+
This problem uses BERT for NLP.
4+
5+
## Requirements
6+
7+
Install tinygrad and mlperf-logging (uncomment mlperf from setup.py) from branch mlperf_training_v5.0.
8+
```
9+
git clone https://github.com/tinygrad/tinygrad.git
10+
python3 -m pip install -e ".[mlperf]"
11+
```
12+
Also install gdown (for dataset), numpy, tqdm and tensorflow.
13+
```
14+
pip install gdown numpy tqdm tensorflow
15+
```
16+
17+
### tinybox_green
18+
Install the p2p driver per [README](https://github.com/tinygrad/open-gpu-kernel-modules/blob/550.54.15-p2p/README.md)
19+
This is the default on production tinybox green.
20+
21+
# 2. Directions
22+
23+
## Steps to download and verify data
24+
25+
### 1. Download raw data
26+
27+
```
28+
BASEDIR="/raid/datasets/wiki" WIKI_TRAIN=1 VERIFY_CHECKSUM=1 python3 extra/datasets/wikipedia_download.py
29+
```
30+
31+
### 2. Preprocess train and validation data
32+
33+
Note: The number of threads used for preprocessing is limited by available memory. With 128GB of RAM, a maximum of 16 threads is recommended.
34+
35+
#### Training:
36+
```
37+
BASEDIR="/raid/datasets/wiki" NUM_WORKERS=16 python3 extra/datasets/wikipedia.py pre-train all
38+
```
39+
40+
Generating a specific topic (Between 0 and 499)
41+
```
42+
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-train 42
43+
```
44+
45+
#### Validation:
46+
```
47+
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-eval
48+
```
49+
## Running
50+
51+
### tinybox_green
52+
53+
#### Steps to run benchmark
54+
```
55+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/run_and_time.sh
56+
```
57+
58+
### tinybox_red
59+
60+
#### Steps to run benchmark
61+
```
62+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/run_and_time.sh
63+
```
64+
### tinybox_8xMI300X
65+
66+
#### Steps to run benchmark
67+
```
68+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_8xMI300X/run_and_time.sh
69+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/bash
2+
3+
export PYTHONPATH="." NV=1
4+
export MODEL="bert"
5+
export DEFAULT_FLOAT="HALF" SUM_DTYPE="HALF" GPUS=6 BS=96 EVAL_BS=96
6+
7+
export FUSE_ARANGE=1 FUSE_ARANGE_UINT=0
8+
9+
export BEAM=8 BEAM_UOPS_MAX=10000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
10+
export IGNORE_JIT_FIRST_BEAM=1
11+
export BEAM_LOG_SURPASS_MAX=1
12+
export BASEDIR="/raid/datasets/wiki"
13+
14+
export BENCHMARK=10 BERT_LAYERS=2 DEBUG=2
15+
16+
python3 examples/mlperf/model_train.py
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/bin/bash
2+
3+
export PYTHONPATH="." NV=1
4+
export MODEL="bert"
5+
export DEFAULT_FLOAT="HALF" SUM_DTYPE="HALF" GPUS=6 BS=96 EVAL_BS=96
6+
7+
export FUSE_ARANGE=1 FUSE_ARANGE_UINT=0
8+
9+
export BEAM=8 BEAM_UOPS_MAX=10000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
10+
export IGNORE_JIT_FIRST_BEAM=1
11+
export BASEDIR="/raid/datasets/wiki"
12+
13+
export WANDB=1 PARALLEL=0
14+
15+
RUNMLPERF=1 python3 examples/mlperf/model_train.py
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
set -e # Exit on any error
3+
set -o pipefail # Make pipeline fail if any command fails
4+
5+
export PYTHONPATH="." NV=1
6+
export MODEL="bert"
7+
export SUBMISSION_PLATFORM="tinybox_green"
8+
export DEFAULT_FLOAT="HALF" SUM_DTYPE="HALF" GPUS=6 BS=96 EVAL_BS=96
9+
10+
export FUSE_ARANGE=1 FUSE_ARANGE_UINT=0
11+
12+
export BEAM=8 BEAM_UOPS_MAX=10000 BEAM_UPCAST_MAX=256 BEAM_LOCAL_MAX=1024 BEAM_MIN_PROGRESS=5
13+
export IGNORE_JIT_FIRST_BEAM=1
14+
export BASEDIR="/raid/datasets/wiki"
15+
16+
# pip install -e ".[mlperf]"
17+
export LOGMLPERF=1
18+
19+
export SEED=$RANDOM
20+
DATETIME=$(date "+%m%d%H%M")
21+
LOGFILE="bert_green_${DATETIME}_${SEED}.log"
22+
23+
# init
24+
BENCHMARK=10 INITMLPERF=1 BERT_LAYERS=2 python3 examples/mlperf/model_train.py | tee $LOGFILE
25+
26+
# run
27+
PARALLEL=0 RUNMLPERF=1 python3 examples/mlperf/model_train.py | tee -a $LOGFILE
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# 1. Problem
2+
3+
This problem uses BERT for NLP.
4+
5+
## Requirements
6+
7+
Install tinygrad and mlperf-logging (uncomment mlperf from setup.py) from branch mlperf_training_v5.0.
8+
```
9+
git clone https://github.com/tinygrad/tinygrad.git
10+
python3 -m pip install -e ".[mlperf]"
11+
```
12+
Also install gdown (for dataset), numpy, tqdm and tensorflow.
13+
```
14+
pip install gdown numpy tqdm tensorflow
15+
```
16+
17+
### tinybox_green
18+
Install the p2p driver per [README](https://github.com/tinygrad/open-gpu-kernel-modules/blob/550.54.15-p2p/README.md)
19+
This is the default on production tinybox green.
20+
21+
# 2. Directions
22+
23+
## Steps to download and verify data
24+
25+
### 1. Download raw data
26+
27+
```
28+
BASEDIR="/raid/datasets/wiki" WIKI_TRAIN=1 VERIFY_CHECKSUM=1 python3 extra/datasets/wikipedia_download.py
29+
```
30+
31+
### 2. Preprocess train and validation data
32+
33+
Note: The number of threads used for preprocessing is limited by available memory. With 128GB of RAM, a maximum of 16 threads is recommended.
34+
35+
#### Training:
36+
```
37+
BASEDIR="/raid/datasets/wiki" NUM_WORKERS=16 python3 extra/datasets/wikipedia.py pre-train all
38+
```
39+
40+
Generating a specific topic (Between 0 and 499)
41+
```
42+
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-train 42
43+
```
44+
45+
#### Validation:
46+
```
47+
BASEDIR="/raid/datasets/wiki" python3 extra/datasets/wikipedia.py pre-eval
48+
```
49+
## Running
50+
51+
### tinybox_green
52+
53+
#### Steps to run benchmark
54+
```
55+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/run_and_time.sh
56+
```
57+
58+
### tinybox_red
59+
60+
#### Steps to run benchmark
61+
```
62+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_red/run_and_time.sh
63+
```
64+
### tinybox_8xMI300X
65+
66+
#### Steps to run benchmark
67+
```
68+
examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_8xMI300X/run_and_time.sh
69+
```

0 commit comments

Comments
 (0)