Skip to content

Maxtext unit tests with Pathways backend. #1211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

RoshaniN
Copy link
Collaborator

@RoshaniN RoshaniN commented Jan 28, 2025

Description

This PR enables TPU unit tests to also run with Pathways backend. Essentially, we will have two sets of tests - one with McJAX and one with Pathways.

  • This change being made to ensure feature parity between Pathways and McJAX.
  • The tests run as part of a docker compose script which sets up the Pathways containers along with Maxtext. (Github Actions didn't have enough support for deploying Pathways containers as "service containers".)
  • TPU integration tests may be also run with Pathways backend in the future.

For more details, please read the doc on b/397475777 . Note that extra self-hosted runners have been added so that tests can be executed in parallel and complete faster overall.

Tests

Please describe how you tested this change -

  1. Changes tested locally using command bash docker_run_pathways_containers.sh maxtext_image=us-docker.pkg.dev/cloud-tpu-v2-images-dev/pathways/maxtext_jax_stable:latest command="cd MaxText ; python3 -m pytest tests -m 'not gpu_only and not integration_test' -s"
  2. Pathways flow tested on Github workflow -
    Example runs -

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@RoshaniN RoshaniN force-pushed the roshanin_unit branch 13 times, most recently from 3901ccb to 27fbae0 Compare January 31, 2025 02:22
@RoshaniN RoshaniN force-pushed the roshanin_unit branch 12 times, most recently from 0fdec67 to a688c60 Compare February 8, 2025 01:17
@RoshaniN RoshaniN force-pushed the roshanin_unit branch 6 times, most recently from c8d3e1c to 5cfb878 Compare February 22, 2025 00:13
@RoshaniN RoshaniN changed the title [DRAFT] Initial commit Maxtext unit tests with Pathways backend. Maxtext unit tests with Pathways backend. Feb 27, 2025
@RoshaniN RoshaniN requested a review from shauryagup February 27, 2025 17:51
@RoshaniN RoshaniN force-pushed the roshanin_unit branch 5 times, most recently from a8ed8d1 to 7393067 Compare March 4, 2025 21:28
Modified Pathways workflow to run specifically on Pathways runner.

Final tests on Pathways.

New changes to help run tests.

Trial to get Pathways working.

run Pathways containers and Maxtext as part of the same job.

Move installation to script

More changes.

Installing docker also as part of the script.

Simplified flow test.

Few more changes.

New way of installation

Docker compose with Maxtext and Pathways containers.

Other changes to use maxtext container in docker compose YAML.

Trying to merge all services together.

Adding everything as services again.

Directly running docker compose.

Elegant solution with Pathways containers as services.

Elegant solution with Pathways containers as services.

Reverting to Maxtext in docker compose.

Changes to run with latest JAX SS image.

Run tests with correct markers.

Move installation to Github runner step, simplify command.

Formalize the changes.

Formalize the changes.

Formalize the changes.

Notify on Pathways test failures, modify test script.

Error handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants