Use ExternalProject_Add to build tokenizers #11550

larryliu0820 · 2025-06-11T16:25:05Z

This way it avoids the CMake target name collision between XNNPACK's memory and Abseil's memory

This pull request updates the build configuration for the extension_llm_runner target in CMakeLists.txt and modifies the tokenizers submodule reference. The changes aim to improve dependency management by transitioning from a subdirectory approach to using ExternalProject_Add, and updating the tokenizers submodule to a new commit.

Build Configuration Updates:

extension/llm/runner/CMakeLists.txt: Replaced add_subdirectory for the tokenizers dependency with ExternalProject_Add, enabling more flexible configuration options such as setting regex lookahead support and specifying an installation directory. Updated dependency handling by adding tokenizers_external_project as a dependency and using find_package to locate tokenizers. Adjusted target_include_directories to use TOKENIZERS_INCLUDE_DIRS and the new installation path.

Submodule Update:

extension/llm/tokenizers: Updated the tokenizers submodule reference to commit a2bac3b7a54a3816d66b8fd3e705f8724d2d5f2f, to incorporate changes in this PR: Enable install find package pytorch-labs/tokenizers#82

This way it avoids the CMake target name collision between XNNPACK's `memory` and Abseil's `memory`

pytorch-bot · 2025-06-11T16:25:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11550

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures

As of commit e92ac85 with merge base 09a4b9d ():

NEW FAILURES - The following jobs have failed:

trunk / test-arm-backend (test_run_ethosu_fvp) / linux-job (gh)
RuntimeError: Command docker exec -t 44be6e48a5c7e4e4708c530591ed3f79cce56ab7a6fea12e4851247e962274b0 /exec failed with exit code 1
trunk / test-llama-runner-linux (bf16, custom, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / linux-job (gh)
RuntimeError: Command docker exec -t b9c368b14b53d87f99cb205a9bf2b4bc445d18efa117496c56a6b9414cf8a3d1 /exec failed with exit code 1
trunk / test-llama-runner-linux (bf16, portable, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
RuntimeError: Command docker exec -t 3a16abe5a2d9abd76e86a714c387bdd7eb6b56cb653eaa91fe3cf05f8692d9fc /exec failed with exit code 1
trunk / test-llama-runner-linux (bf16, portable, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / linux-job (gh)
RuntimeError: Command docker exec -t 4baf83aee941d85dbc26f67cbaa56283148cbd538ccbb1e5a6426134d3c6c983 /exec failed with exit code 1
trunk / test-llama-runner-linux (fp32, portable, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
RuntimeError: Command docker exec -t 4b5f30481ed913c6369022924be605ece626ba27c6d2fcb3e74aca041d56401b /exec failed with exit code 1
trunk / test-llama-runner-linux (fp32, portable, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / linux-job (gh)
RuntimeError: Command docker exec -t b7d3aa9cabdcdc19484e8b1eb5c3e6dc35027c4b48265ea09d2228be452f4753 /exec failed with exit code 1
trunk / test-llama-runner-linux (fp32, xnnpack+custom, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
RuntimeError: Command docker exec -t 88bbe5a1a7de75506cd271059948847598a56257c1e4336fa62159b0d23ad416 /exec failed with exit code 1
trunk / test-llama-runner-linux (fp32, xnnpack+custom, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11... / linux-job (gh)
RuntimeError: Command docker exec -t def1d20522b798dd84c6a50653736ae82f3026e93178a88b5867520d84e721ef /exec failed with exit code 1
trunk / test-llama-runner-mac (fp32, coreml) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-llama-runner-mac (fp32, mps) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-llama-runner-mac (fp32, xnnpack+custom+quantize_kv) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t ba5b86c5db347712f15297fac4ba2a943f9a7a82a29956d59b73761de3792d56 /exec failed with exit code 1
trunk / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 7041ef00e4ae7ab81c82516db3b54fad06cef389362538c6ee7053c7901545d5 /exec failed with exit code 1
trunk / test-llama-torchao-lowbit / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mergennachin · 2025-06-13T14:59:09Z

extension/llm/runner/CMakeLists.txt

-add_subdirectory(
-  ${EXECUTORCH_ROOT}/extension/llm/tokenizers
-  ${CMAKE_CURRENT_BINARY_DIR}/../../../extension/llm/tokenizers
+ExternalProject_Add(


So, if i understand correctly, ExternalProject_Add creates a separate build process and as a result cmake variables aren't shared.

If you're cross-compiling, don't you wanna propagate toolchain settings and compiler flags?

I think FetchContent propagates compiler flags and toolchain setting. Looks like FetchContent_Declare and FetchContent_MakeAvailable will do the job, and it happens during config time, not build time.

Also, fwiw, looks like XNNPACK is fixing this issue (google/XNNPACK#8488) with this PR (google/XNNPACK#8552)

Use ExternalProject_Add to build tokenizers

e92ac85

This way it avoids the CMake target name collision between XNNPACK's `memory` and Abseil's `memory`

larryliu0820 requested review from jathu, kirklandsign, jackzhxng, iseeyuan and swolchok as code owners June 11, 2025 16:25

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 11, 2025

larryliu0820 added the release notes: none Do not include this in the release notes label Jun 11, 2025

mergennachin reviewed Jun 13, 2025

View reviewed changes

mergennachin added the ciflow/trunk label Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ExternalProject_Add to build tokenizers #11550

Use ExternalProject_Add to build tokenizers #11550

Uh oh!

larryliu0820 commented Jun 11, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 11, 2025 •

edited

Loading

Uh oh!

mergennachin Jun 13, 2025

Uh oh!

mergennachin Jun 13, 2025

Uh oh!

Uh oh!

Use ExternalProject_Add to build tokenizers #11550

Are you sure you want to change the base?

Use ExternalProject_Add to build tokenizers #11550

Uh oh!

Conversation

larryliu0820 commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build Configuration Updates:

Submodule Update:

Uh oh!

pytorch-bot bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11550

❌ 14 New Failures

Uh oh!

mergennachin Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

larryliu0820 commented Jun 11, 2025 •

edited

Loading

pytorch-bot bot commented Jun 11, 2025 •

edited

Loading