Enable Checkpoint Conversion from Huggingface to Maxtext #1839

YixuanWang-99 · 2025-06-16T18:40:38Z

Description

Enable checkpoint conversion from Huggingface to Maxtext.

Add to_maxtext.py to perform the checkpoint conversion from HF to MaxText.
Add convert_gemma2_to_mt.sh to automate the conversion and verification.
Add mt_hf_mutual_conversion_check.py to compare the Huggingface and MaxText checkpoints.
Official Gemma2 models are supported.

Tests

The converted checkpoint is tested with mt_hf_mutual_conversion_check.py. It compared:

For given prompts, top-k predicted tokens and scores for the next token;
KL divergence of the full logit distributions

Tested on Gemma2-2b Model. A successful conversion example.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

hengtaoguo

Excellent work!

hengtaoguo · 2025-06-20T06:02:31Z

MaxText/tests/mt_hf_mutual_conversion_check.py

IIUC, mt_hf_mutual_conversion_check checks MaxText vs HF, while hf_checkpoint_conversion_check checks converted HF vs original HF? Would you suggest to deprecate the latter one?

Yes, the latter one is not necessary any more.

hengtaoguo · 2025-06-20T17:57:30Z

MaxText/tests/mt_hf_mutual_conversion_check.py

+    # Tokenize for HF
+    inputs = tokenizer(input_text, return_tensors="pt", padding=True, max_length=config.max_target_length, truncation=True)
+    actual_seq_len = inputs["input_ids"].shape[1]
+    # actual_seq_len = 4


nit: shall we remove these commented codes?

hengtaoguo · 2025-06-20T17:57:55Z

MaxText/tests/mt_hf_mutual_conversion_check.py

+    mt_decoder_positions = mt_decoder_positions_full[:, :actual_seq_len]
+    # max_logging.log(f"MaxText input shapes: ids={mt_ids.shape}, "
+    #                 f"decoder_positions={mt_decoder_positions.shape}, "
+    #                 f"decoder_segment_ids={mt_decoder_segment_ids.shape}")


nit: similar to above

hengtaoguo · 2025-06-20T17:59:55Z

MaxText/utils/ckpt_conversion/to_maxtext.py

This is nice, thanks for adding such a feature!

hengtaoguo · 2025-06-20T18:07:44Z

MaxText/utils/ckpt_conversion/to_maxtext.py

+  # Get parameter mappings and hooks
+  model_key = config.model_name
+  param_map_mt_to_hf = PARAM_MAPPING[model_key](hf_config_obj.to_dict(), config.scan_layers)
+  hook_fn_map_mt = HOOK_FNS[model_key](hf_config_obj.to_dict(), config.scan_layers, saving_to_hf=False)


Hook_fn is the type of function that you customize to convert the parameter array, such as reshape/transpose right?

Yes, mt_to_hf and hf_to_mt use the same param_mapping and hook_fn mapping.

hengtaoguo · 2025-06-20T18:21:32Z

MaxText/utils/ckpt_conversion/to_maxtext.py

+  mesh = jax.sharding.Mesh(devices_array, config.mesh_axes)
+
+  # Load Hugging Face model, config, and state_dict
+  max_logging.log(f"Loading Hugging Face model: {model_id}...")


Do we have any mechanism to check the supported model_id? If a user provides an unsupported model, we could raise an error like: "Model {model_id} is not currently supported in MaxText. Supported models are: {list of supported models from ckpt_conversion}."

Added. Also revised the to_huggingface.py to make it consistent

hengtaoguo · 2025-06-20T18:35:53Z

Hi @gagika ! I've heard this might be interesting to you for loading/saving HF checkpoints. Would you like to take a look when you got a chance? Thanks a lot for your time!

shralex

Thanks Yixuan! Added a few comments

shralex · 2025-06-21T22:50:36Z

MaxText/utils/ckpt_conversion/to_maxtext.py

+from MaxText.train import save_checkpoint
+from MaxText.utils.ckpt_conversion.utils.param_mapping import HOOK_FNS, PARAM_MAPPING
+from MaxText.utils.ckpt_conversion.utils.utils import apply_hook_fns, HF_IDS
+


Could you please add comments explaining what this script does, which parameters are supported, which os environment variables are needed for it to run, and example(s) of invoking it (if you'd like you could point to convert_gemma2_to_mt.sh or copy relevant command here as well)

shralex · 2025-06-21T22:57:32Z

MaxText/utils/ckpt_conversion/to_maxtext.py

+  model_id = HF_IDS[config.model_name]
+  max_utils.print_system_information()
+  if not config.base_output_directory:
+    output_directory = os.path.expanduser("~/.mt_output/")


How about output_directory = os.path.join(os.getcwd() , "mt_output")
so that the output is under the directory where the script is invoked

shralex · 2025-06-21T23:09:15Z

MaxText/utils/ckpt_conversion/to_maxtext.py

+  max_logging.log("Starting weight transformation...")
+  final_mt_weights_numpy_list = []
+
+  for path_tuple, abstract_leaf_value in abstract_params_flat:


Could you please add more comments here, perhaps an example of a parameter and how its mapped.

shralex · 2025-06-21T23:12:08Z

MaxText/tests/mt_hf_mutual_conversion_check.py

+It loads the HF checkpoint and a maxtext checkpoint, and:
+    1. runs a foward pass of a MaxText model and a HF model
+    2. compares their output logits for a given input
+    3. compares the predicted token sequences


Can you give an example how to invoke it, what parameters to pass etc

shralex · 2025-06-21T23:41:45Z

MaxText/utils/ckpt_conversion/examples/convert_gemma2_to_mt.sh

+  tokenizer_path="${TOKENIZER_PATH}" \
+  load_parameters_path="${OUTPUT_BASE_DIR}/0/items" \
+  per_device_batch_size="${PER_DEVICE_BATCH_SIZE}" \
+  run_name="mt_gemma2_check" \


could you use MODEL_NAME here instead of hardcoding gemma2, also below

shralex · 2025-06-21T23:42:51Z

MaxText/utils/ckpt_conversion/examples/convert_gemma2_to_mt.sh

+
+echo "--- Starting Comparing Logits and Predicted Tokens ---"
+python3 -m "MaxText.tests.mt_hf_mutual_conversion_check" \
+    hf_model_id="google/gemma-2-2b" \


could you make this a parameter, also below instead of hardcoding, so all model-specific parameters are at the top of this script

Enable conversion from huggingface to maxtext

74b2608

YixuanWang-99 changed the title ~~Enable conversion from Huggingface to Maxtext~~ Enable Checkpoint Conversion from Huggingface to Maxtext Jun 16, 2025

YixuanWang-99 added 3 commits June 16, 2025 19:56

convert shell script added

5271ef7

add mt_hf_conversion check and example shell

adec18d

refine arguments of mt_hf_check

3c431ca

hengtaoguo approved these changes Jun 20, 2025

View reviewed changes

hengtaoguo marked this pull request as ready for review June 20, 2025 18:43

hengtaoguo requested review from gobbleturk, khatwanimohit, bvandermoon, vipannalla, RissyRan, richjames0, gagika, shralex, yangyuwei, SurbhiJainUSC, A9isha and aireenmei as code owners June 20, 2025 18:43

Minor fix of model_id check and remove unused comments

289f62e

shralex reviewed Jun 21, 2025

View reviewed changes

Enable Checkpoint Conversion from Huggingface to Maxtext #1839

Are you sure you want to change the base?

Enable Checkpoint Conversion from Huggingface to Maxtext #1839

Uh oh!

Conversation

YixuanWang-99 commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

hengtaoguo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hengtaoguo commented Jun 20, 2025

Uh oh!

shralex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YixuanWang-99 commented Jun 16, 2025 •

edited

Loading