-
Notifications
You must be signed in to change notification settings - Fork 6
love2code (3B) recipe #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
revbucket
wants to merge
59
commits into
main
Choose a base branch
from
love2code
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
59 commits
Select commit
Hold shift + click to select a range
e98a0b6
love2code recipe
156a51d
Updated recipes to start love2code
5f56020
Turning off pre-empt because we want results yesterday
7f5f6fc
Changed workspace
889b53b
Oops, jupiter 2
a867ad9
Fix model identifier in 1b config
undfined 1916ebd
love2code sample
0281713
Updated sample
5979d95
Added split paths to 1b5xc
32dc2bc
Temp solution don't merge this
undfined de8a541
Added python only config
c2f3025
Added olmo2 1b5xc config
09e2818
fix typo in 1b5xc olmo2 config
c548813
Added starcoder1 config
406a111
Merged tm's long config reader
d151faa
fixed olmo2 mix config
3d9c35c
Fixed weka path in starcoder1 config
0e39f9d
Fixed weka paths in python config
c14b8d3
Try without MixtureBuilder
undfined 6ba21e7
Use main if not using MixtureBuilder
undfined 7ddc48f
Weka globs
undfined d224f2d
Fixed up starcoder1 config to have 15pct codeprose
bf2f085
Merge branch 'love2code' of github.com:allenai/olmo-cookbook into lov…
92029ae
Try MixtureBuilder again
undfined 43b1f87
Try with main and revert a sampling change
undfined e4ebeaa
Added dclm4pct yaml
7ca87c6
Merged
3aa108c
atually merged
fcd92c0
merged fr fr
e0beaa8
Changed model name for some reason
406f103
added olmo4pct
eb3b640
Actually pass the group_id override
undfined 85362f5
fixed merge 1
be6763a
Merge branch 'main' into love2code
4b3375d
bumped gpus here
5515893
downsizing nodes on 4pct
0ddd0e2
Trying santacoder-ish data
8b4c620
trying starcoder2-3b model
6134547
Some minor fixes + 3b attempt
89c070b
oops
cd05c7d
wow im dumb
3e615fd
debugging starcoder2_3b
40bf29d
Oookay, OOM error, bumping gpus
6d0394f
Added smaller rmbsz
045088d
bumped rmbsz
32fa132
Merge branch 'main' of github.com:allenai/olmo-cookbook
63572b2
Trying gs setup
0684fc3
Trying sample run
b76071a
Added checks on the experiment group?
6272b8b
Merged
f1b59a6
Added cookbook 3B-5xC
3272fd1
Fixed token files in 3b5xc
f8c2a36
Added support for sc2-3b
a1ae5b9
Bumped nodes
fd81fb9
Changed checkpointer to id leak
f0bd921
Pull back for ephemeral saves
13d5f95
Merge branch 'main' of github.com:allenai/olmo-cookbook
95193ba
merged
c182ab2
fixed ref to dpconfig
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -176,6 +176,12 @@ | |
"bigcodebench_hard::none", | ||
] | ||
|
||
ALL_1B_TASKS = [ | ||
"hellaswag", | ||
"piqa", | ||
] + MMLU_CATEGORIES | ||
|
||
|
||
STARCODER_CODEX_TASKS = [ | ||
"codex_humaneval::starcoder_pass@1", | ||
"codex_humaneval::starcoder_pass@10", | ||
|
@@ -200,6 +206,7 @@ | |
"starcoder": STARCODER_CODEX_TASKS, | ||
"starcoder::pass@1": STARCODER_PASS_AT_1_TASKS, | ||
"code-no-bcb": [task for task in ALL_CODEX_TASKS if "bigcodebench" not in task], | ||
"1b-evals": ALL_1B_TASKS, | ||
} | ||
|
||
OE_EVAL_GIT_URL = "[email protected]:allenai/oe-eval-internal.git" | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
from dataclasses import dataclass | ||
from enum import Enum | ||
|
||
from olmo_core.config import Config | ||
from olmo_core.data import NumpyDataLoaderConfig, NumpyDatasetConfig, TokenizerConfig | ||
from olmo_core.distributed.parallel import DataParallelType | ||
from olmo_core.nn.transformer import TransformerBlockType, TransformerConfig | ||
from olmo_core.optim import AdamWConfig | ||
from olmo_core.train import TrainerConfig | ||
|
||
|
||
@dataclass | ||
class ModelTrainConfig(Config): | ||
model: TransformerConfig | ||
optim: AdamWConfig | ||
dataset: NumpyDatasetConfig | ||
data_loader: NumpyDataLoaderConfig | ||
trainer: TrainerConfig | ||
init_seed: int = 12536 | ||
|
||
|
||
@dataclass | ||
class ModelConfig: | ||
compile: bool | ||
d_model: int | ||
n_heads: int | ||
n_layers: int | ||
rope_theta: int | ||
flash_attention: bool | ||
max_sequence_length: int | ||
layer_norm_eps: float = 1e-6 | ||
save_interval: int = 1000 | ||
eval_interval: int = 200 | ||
device_batch_size: int = 8 | ||
batch_divisor: int = 32 | ||
eps: float = 1e-8 | ||
betas: tuple = (0.9, 0.95) | ||
weight_decay: float = 0.1 | ||
max_grad_norm: float = 1.0 | ||
decay_embeddings: bool = False | ||
qk_norm: bool = True | ||
dp_type: DataParallelType = DataParallelType.fsdp | ||
block_type: TransformerBlockType = TransformerBlockType.reordered_norm | ||
|
||
@classmethod | ||
def olmo_30m(cls) -> "ModelConfig": | ||
return ModelConfig( | ||
compile=True, | ||
d_model=256, | ||
n_heads=8, | ||
n_layers=4, | ||
rope_theta=500_000, | ||
flash_attention=True, | ||
max_sequence_length=4096, | ||
) | ||
|
||
@classmethod | ||
def olmo_190m(cls) -> "ModelConfig": | ||
return ModelConfig( | ||
compile=True, | ||
d_model=768, | ||
n_heads=12, | ||
n_layers=12, | ||
rope_theta=500_000, | ||
flash_attention=True, | ||
max_sequence_length=4096, | ||
) | ||
|
||
@classmethod | ||
def olmo_1b(cls) -> "ModelConfig": | ||
""" | ||
OLMo-1b (1_336_035_328 parameters) | ||
(1_131_841_536 nonembed params) | ||
""" | ||
return ModelConfig( | ||
compile=True, | ||
d_model=2048, | ||
n_heads=16, | ||
n_layers=18, | ||
rope_theta=500_000, | ||
flash_attention=True, | ||
max_sequence_length=4096, | ||
) | ||
|
||
@classmethod | ||
def love2code_3b(cls) -> "ModelConfig": | ||
""" | ||
num params should be : 3607267840 | ||
num non_embed parmams should be: 3481438720 | ||
""" | ||
return ModelConfig( | ||
compile=True, | ||
d_model=2560, | ||
n_heads=32, | ||
n_layers=32, | ||
rope_theta=500_000, | ||
flash_attention=True, | ||
max_sequence_length=2048, | ||
) | ||
|
||
|
||
class SupportedModels(Enum): | ||
olmo_190m = ModelConfig.olmo_190m() | ||
olmo_30m = ModelConfig.olmo_30m() | ||
olmo_1b = ModelConfig.olmo_1b() | ||
starcoder2_3b = ModelConfig.starcoder_3b() | ||
|
||
|
||
class SupportedTokenizers(Enum): | ||
dolma2 = TokenizerConfig.dolma2() | ||
gpt_neox = TokenizerConfig.gpt_neox_olmo_dolma_v1_5() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this is a generic OLMo-3b config right? Or are there code specific HP's here?