Is the embedding weight for models which uses weight tying is being duplicated for "offloading"? #6451

qnixsynapse · 2024-04-03T05:27:55Z

qnixsynapse
Apr 3, 2024
Collaborator

Hello. I am not an expert in C++. But from what I understand from the code is that the embedding weight is duplicated for models which uses weight tying like Gemma 7B(8.5B).

model.output      = ml.create_tensor(ctx_output, tn(LLM_TENSOR_TOKEN_EMBD,  "weight"), {n_embd, n_vocab}); // same as tok_embd, duplicated to allow offloading

If this is really the case, I think this will increase the model size significantly with worse quality. I was wondering why I always get OOM when trying to load Gemma 7B(actually 8.5B) on my GPU and probably this might be the case. The shape of this tensor is (256000 x 3072)!! I am not entirely sure so I thought of asking here first before opening an issue.

Answered by ggerganov

Apr 3, 2024

It is stored one time in CPU RAM for the input token embeddings (ctx_input) and one more time here in the GPU RAM for the output (ctx_output). So the answer is that it is duplicated, one of the copies is in RAM and the other is in VRAM

View full answer

ggerganov · 2024-04-03T13:54:01Z

ggerganov
Apr 3, 2024
Maintainer

It is stored one time in CPU RAM for the input token embeddings (ctx_input) and one more time here in the GPU RAM for the output (ctx_output). So the answer is that it is duplicated, one of the copies is in RAM and the other is in VRAM

1 reply

qnixsynapse Apr 3, 2024
Collaborator Author

I see. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is the embedding weight for models which uses weight tying is being duplicated for "offloading"? #6451

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is the embedding weight for models which uses weight tying is being duplicated for "offloading"? #6451

Uh oh!

Uh oh!

qnixsynapse Apr 3, 2024 Collaborator

Replies: 1 comment · 1 reply

Uh oh!

ggerganov Apr 3, 2024 Maintainer

Uh oh!

qnixsynapse Apr 3, 2024 Collaborator Author

qnixsynapse
Apr 3, 2024
Collaborator

Replies: 1 comment 1 reply

ggerganov
Apr 3, 2024
Maintainer

qnixsynapse Apr 3, 2024
Collaborator Author