Skip to content

Conversation

marceloamaral
Copy link

Summary

This PR introduces a new IBM AIU (Artificial Intelligence Unit) plugin for Kineto.
The AIU is a dedicated accelerator designed to efficiently run large-scale AI workloads. This integration extends Kineto’s profiling capabilities to support AIU backend, enabling developers to gain detailed insights into AIU execution behavior.

Motivation

Kineto currently supports multiple backends (e.g., CUDA, ROCm, XPU) for profiling. With the growing use of IBM’s AIU in research and production AI systems, it is essential to provide native support for profiling this hardware. This allows PyTorch developers and researchers to:

  • Collect and analyze execution traces from AIU workloads.
  • Gain performance visibility at the operator and runtime levels.
  • Compare and optimize performance across heterogeneous hardware setups.

Features

  • Added AIU plugin for profiling IBM AIU backend.
  • Support for capturing and visualizing AIU execution traces in the same way as existing Kineto backends.
  • Integrated event schema for AIU operations.
  • Example workflow for enabling AIU profiling.

Changes

  • New plugin module for AIU profiling (libkineto/plugins/aiu).
  • Backend hooks for AIU runtime events.
  • Schema extensions for AIU trace events.
  • Tests and example configurations.

Tested

  • Verified on IBM AIU hardware with PyTorch models.
  • Ensured compatibility with Chrome trace viewer output.

This PR is inspired in the plugin design introduced by the PR #961

…torch#961)

Summary:
In this PR, the AIU profiler is introduced by following kineto plugin design under libkineto/src/plugin/xpupti. The AIU profiler is build on the foundation of IBM AIU Profiling Toolkit. The plugin is disabled by default and is only activated if the system that kineto is build has the AIUPti installed and LIBKINETO_NOAIUPTI is not set to OFF.

Signed-off-by: Marcelo Amaral <[email protected]>
Signed-off-by: Marcelo Amaral <[email protected]>
@meta-cla meta-cla bot added the cla signed label Aug 22, 2025
@marceloamaral
Copy link
Author

@zejun-chen since you created the PR to add support for XPU, could you please review this PR?

@marceloamaral
Copy link
Author

@aaronenyeshi since you reviewed the PR to add support for XPU support, could you please also review this PR?

{
std::lock_guard<std::mutex> guard(mutex_);

// Differently other backends, aiuptiFlushAllActivities flushes all pending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differently Unlike

@sraikund16
Copy link
Contributor

sraikund16 commented Aug 29, 2025

@aaronenyeshi since you reviewed the PR to add support for XPU support, could you please also review this PR?

I can review. Can you fix the build issue? Can you rebase to main so tests will start?

@sraikund16 sraikund16 self-requested a review August 29, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants