Skip to content

Add --all-ranks CLI infrastructure and core processing logic #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed
140 changes: 139 additions & 1 deletion src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,18 @@ pub struct Cli {
/// For inductor provenance tracking highlighter
#[arg(short, long)]
inductor_provenance: bool,
/// Parse all ranks and generate a single unified page
#[arg(long)]
all_ranks: bool,
}

fn main() -> anyhow::Result<()> {
let cli = Cli::parse();

if cli.all_ranks {
return handle_all_ranks(&cli);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli is no longer needed after handle_all_ranks, which means you can take by value, which in rust defaults to move semantics unless the Cli type implements the Copy trait (it doesn't).

}

let path = if cli.latest {
let input_path = cli.path;
// Path should be a directory
Expand Down Expand Up @@ -92,11 +100,12 @@ fn main() -> anyhow::Result<()> {
strict: cli.strict,
strict_compile_id: cli.strict_compile_id,
custom_parsers: Vec::new(),
custom_header_html: cli.custom_header_html,
custom_header_html: cli.custom_header_html.clone(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clone now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this is necessary because ParseConfig needs to own the string and cli is passed by reference.

Copy link
Member

@xmfan xmfan Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The emphasis of the question is on "why now" rather than just "why", the code compiled and seemed to work already before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what happened is during your changes, cli was changed to a reference. It no longer is in this current PR version, so this .clone() shouldn't be necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, in my current cli.rs file, I would need the clone in handle_one_rank() because handle_one_rank takes in &Cli and can't move the String out of a borrowed struct. Is this right?

Copy link
Member

@xmfan xmfan Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone happens because you want to create ParseConfig using a Cli ref, but do you need 1 ParseConfig per rank?

Hint: Are there differences between the ParseConfig created for each rank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm since ParseConfig is the same for all ranks, I could create ParseConfig once before processing ranks. And then pass that to handle_one_rank instead of passing cli. This would avoid repeated cloning of custom_header_html.

Is this the direction you're suggesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds reasonable

verbose: cli.verbose,
plain_text: cli.plain_text,
export: cli.export,
inductor_provenance: cli.inductor_provenance,
all_ranks: cli.all_ranks,
};

let output = parse_path(&path, config)?;
Expand All @@ -114,3 +123,132 @@ fn main() -> anyhow::Result<()> {
}
Ok(())
}

// handle_all_ranks function with placeholder landing page
fn handle_all_ranks(cli: &Cli) -> anyhow::Result<()> {
let input_path = &cli.path;

if !input_path.is_dir() {
bail!(
"Input path {} must be a directory when using --all-ranks",
input_path.display()
);
}

let out_path = &cli.out;

if out_path.exists() {
if !cli.overwrite {
bail!(
"Directory {} already exists, use -o OUTDIR to write to another location or pass --overwrite to overwrite the old contents",
out_path.display()
);
}
fs::remove_dir_all(&out_path)?;
}
fs::create_dir(&out_path)?;

// Find all rank log files in the directory
let rank_files: Vec<_> = std::fs::read_dir(input_path)
.with_context(|| format!("Couldn't access directory {}", input_path.display()))?
.flatten()
.filter(|entry| {
let path = entry.path();
path.is_file()
&& path
.file_name()
.and_then(|name| name.to_str())
.map(|name| name.contains("rank_") && name.ends_with(".log"))
.unwrap_or(false)
})
.collect();

if rank_files.is_empty() {
bail!(
"No rank log files found in directory {}",
input_path.display()
);
}

let mut rank_links = Vec::new();

// Process each rank file
for rank_file in rank_files {
let rank_path = rank_file.path();
let rank_name = rank_path
.file_stem()
.and_then(|name| name.to_str())
.unwrap_or("unknown");

// Extract rank number from filename
let rank_num = if let Some(pos) = rank_name.find("rank_") {
let after_rank = &rank_name[pos + 5..];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look at String.strip_prefix

after_rank
.chars()
.take_while(|c| c.is_ascii_digit())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the log file name patterns you need to match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it supports rank_N.log where N is a numeric variable. The logic uses strip_prefix("rank_") then extracts consecutive ASCII digits as the rank number. Ex:

  • rank_0.log, rank_1.log, rank_10.log
  • rank_0_worker.log
  • something_rank_5.log

But rejects files without the "rank_" prefix or without digits after it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, does the compiler logs actually output log file names with that variance? You only need to handle the filenames logged via TORCH_TRACE right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll take out the unnecessary complexity.

.collect::<String>()
} else {
"unknown".to_string()
};

println!(
"Processing rank {} from file: {}",
rank_num,
rank_path.display()
);

// Create subdirectory for this rank
let rank_out_dir = out_path.join(format!("rank_{rank_num}"));
fs::create_dir(&rank_out_dir)?;

let config = ParseConfig {
strict: cli.strict,
strict_compile_id: cli.strict_compile_id,
custom_parsers: Vec::new(),
custom_header_html: cli.custom_header_html.clone(),
verbose: cli.verbose,
plain_text: cli.plain_text,
export: cli.export,
inductor_provenance: cli.inductor_provenance,
all_ranks: false,
};

let output = parse_path(&rank_path, config)?;

// Write output files to rank subdirectory
for (filename, content) in output {
let out_file = rank_out_dir.join(filename);
if let Some(dir) = out_file.parent() {
fs::create_dir_all(dir)?;
}
fs::write(out_file, content)?;
}

// Add link to this rank's page
rank_links.push((rank_num.clone(), format!("rank_{rank_num}/index.html")));
Copy link
Member

@xmfan xmfan Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to hard code this here. Suppose someone changes the single rank codepath to change the directory where index.html is stored, then this line would break! Is there a way we could ensure that this line remains relevant even if the single rank logic changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Now it uses the actual output path returned by handle_one_rank() instead of hardcoding "index.html".

I tested this by running both single rank and multi-rank processing with different output directories and verified that the generated links correctly point to the actual output files in their respective subdirectories, so that it will remain correct even if the single-rank output structure changes.

}

// Sort rank links by rank number
rank_links.sort_by(|a, b| {
let a_num: i32 = a.0.parse().unwrap_or(999);
let b_num: i32 = b.0.parse().unwrap_or(999);
a_num.cmp(&b_num)
});

// Core logic complete - no HTML generation yet
// TODO - Add landing page HTML generation using template system

println!(
"Generated multi-rank report with {} ranks",
rank_links.len()
);
println!("Individual rank reports available in:");
for (rank_num, _) in &rank_links {
println!(" - rank_{}/index.html", rank_num);
}

// No browser opening since no landing page yet
// TODO - Generate landing page and open browser

Ok(())
}
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ pub struct ParseConfig {
pub plain_text: bool,
pub export: bool,
pub inductor_provenance: bool,
pub all_ranks: bool,
}

impl Default for ParseConfig {
Expand All @@ -50,6 +51,7 @@ impl Default for ParseConfig {
plain_text: false,
export: false,
inductor_provenance: false,
all_ranks: false,
}
}
}
Expand Down
Loading