Skip to content

Conversation

GOODBOY008
Copy link
Member

@GOODBOY008 GOODBOY008 commented Sep 14, 2025

Overview

This PR introduces a comprehensive benchmark performance testing module for FastExcel, implementing the proposal outlined in #572.

Benchmark Results Available

CI Benchmark Run Completed Successfully: https://github.com/GOODBOY008/fastexcel/actions/runs/17709908635

Benchmark Artifacts: The workflow generated comprehensive benchmark reports available in the artifacts:

  • HTML Reports: Interactive benchmark comparison reports
  • Raw Results: JMH benchmark data in JSON format
  • Analysis: Performance analysis and comparison metrics

To view the benchmark reports:

  1. Download the benchmark-results artifact from: https://github.com/GOODBOY008/fastexcel/actions/runs/17709908635
  2. Unzip the downloaded file
  3. Open benchmark-reports/benchmark-comparison.html in your browser

What's Changed

New Module: fastexcel-benchmark

• JMH Integration: Complete Maven configuration with industry-standard Java microbenchmarking framework
• Comprehensive Test Suites:
◦ Comparison benchmarks (FastExcel vs Apache POI)
◦ Memory efficiency specialized tests
◦ Streaming operation performance tests
◦ Microbenchmarks for core components
• Automated Execution: Multi-profile support with configurable dataset sizes and memory settings
• Advanced Features:
◦ Interactive CLI with scenario management
◦ Real-time memory profiling with GC tracking
◦ HTML visualization reports and JSON data export
◦ Performance trend analysis and regression detection

Key Components

  1. Core Framework (cn.idev.excel.benchmark.core)
    ◦ Abstract benchmark base classes
    ◦ Configuration management
    ◦ Memory profiler integration

  2. Test Scenarios (cn.idev.excel.benchmark.*)
    ◦ Read/Write operation benchmarks
    ◦ Fill operation performance tests
    ◦ Streaming benchmarks for large datasets
    ◦ Memory efficiency analysis

  3. Comparison Benchmarks (cn.idev.excel.benchmark.comparison)
    ◦ Direct FastExcel vs Apache POI performance comparison
    ◦ Multi-dimensional analysis (throughput, latency, memory)

  4. Utilities (cn.idev.excel.benchmark.utils)
    ◦ Test data generation
    ◦ File management utilities
    ◦ Reporting and visualization

  5. Automated Scripts (scripts/benchmark-runner.sh)
    ◦ Profile-based execution (quick/standard/comprehensive)
    ◦ Configurable parameters and output formats
    ◦ Regression analysis automation

GitHub Actions Integration

Workflow (.github/workflows/benchmark.yml)
◦ Manual trigger with workflow_dispatch for on-demand benchmarking
◦ Java 11 setup with proper classpath resolution
◦ Automated artifact upload for benchmark results
◦ Fixed JMH forking issues for reliable results

Test Scenarios Coverage

• Data Scales: SMALL(1K) → MEDIUM(10K) → LARGE(100K) → EXTRA_LARGE(1M+)
• File Formats: XLSX
• Operation Types: Read, Write, Fill, Streaming
• Memory Analysis: Real-time monitoring, GC pressure analysis, allocation patterns

Benefits

  1. Validates Performance Claims: Provides empirical evidence for FastExcel's performance advantages
  2. Quality Assurance: Enables systematic performance analysis and regression detection
  3. User Confidence: Transparent performance reports for informed decision-making
  4. Development Guidance: Data-driven optimization insights

Closes #572

@GOODBOY008 GOODBOY008 changed the title feat: Add comprehensive benchmark comparison workflow for FastExcel vs Apache POI feat: Introduce FastExcel Benchmark Performance Testing Module Sep 14, 2025
@GOODBOY008
Copy link
Member Author

GOODBOY008 commented Sep 14, 2025

@delei @alaahong

CI Benchmark Run Completed Successfully: https://github.com/GOODBOY008/fastexcel/actions/runs/17709908635

To view the benchmark reports:

  1. Download the benchmark-results artifact from: https://github.com/GOODBOY008/fastexcel/actions/runs/17709908635/artifacts/4006114037
  2. Unzip the downloaded file
  3. Open benchmark-reports/benchmark-comparison.html in your browser

There are a few issues to address:

  1. In the Performance Comparisons section of the HTML report, the content is incomplete. A dataset and a format column need to be added.
  2. For the 1M dataset scenario, the POI run failed, so no benchmark results were generated.

@psxjoy
Copy link
Member

psxjoy commented Sep 14, 2025

I'm really excited about this PR. However, it's quite large, so the code review will take some time.

Also, no offense intended, but I'd like to ask: Did you use AI-generated code in this PR?

@GOODBOY008
Copy link
Member Author

I'm really excited about this PR. However, it's quite large, so the code review will take some time.

Also, no offense intended, but I'd like to ask: Did you use AI-generated code in this PR?

@psxjoy Yes, some parts (like the comparison report, memory profiler logic, and quickstart scripts) were AI-assisted.AI is quite effective in these scenarios, I’ve verified them to make sure they work correctly.

I noticed the artifact wasn’t accessible, so I’ve uploaded the results for your review.
benchmark-results.zip

@psxjoy psxjoy added developing This feature will be added in future releases discussion welcome Welcome to join the discussion together enhancement New feature or request labels Sep 14, 2025
@delei
Copy link
Member

delei commented Sep 18, 2025

Hi, @GOODBOY008
Thank you for submitting the PR.

Regarding this PR, I still have some questions:

  • It seems that the file ./fastexcel-benchmark/scripts/benchmark-runner.sh does not exist?
  • Introducing JMH benchmark testing is highly necessary, but currently we don't need to run it through CI.
  • If possible, I suggest deleting the code for generating reports and analyzing results, and only keeping the JMH classes.

Please refer to the above suggestions and make appropriate modifications to the PR content. After that, we will vote on this PR together with other reviewers ASAP.

@GOODBOY008
Copy link
Member Author

Hi @delei
Thanks for your feedback.

For the first point, I understand the concern about the PR size — my intention was to split the work into stages, so this submission might look a bit large.

Regarding the second and third points:
• Running benchmarks in CI helps produce relatively stable and reproducible results. Running them locally is often influenced by background tasks and can take a long time.
• As for report generation and analysis, they make it easier to compare multiple runs, especially when evaluating different scenarios. Doing this entirely by hand would be quite time-consuming.

I’m fine with keeping only the JMH core classes for now, but I’d like to highlight the above considerations.

@GOODBOY008 GOODBOY008 force-pushed the feat/benchmark-comparison-workflow branch from 5a95c2d to 1be72cb Compare September 23, 2025 07:50
@GOODBOY008
Copy link
Member Author

@delei PTAL

- Create new GitHub Actions workflow for running benchmark
- Add JavaScript code for sorting table columns in HTML report- Update table headers to include sort indicators and make them clickable
- Remove generation
- Group results by operation, dataset size, and file format
- Increase measurement iterations for more reliable results
- Update JVM arguments to improve performance
@GOODBOY008 GOODBOY008 force-pushed the feat/benchmark-comparison-workflow branch from 1be72cb to 1824158 Compare September 26, 2025 08:49
@delei delei added PR: require-multiple-approvals This pull request requires multiple approvals. and removed enhancement New feature or request developing This feature will be added in future releases discussion welcome Welcome to join the discussion together labels Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: require-multiple-approvals This pull request requires multiple approvals.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Introducing FastExcel Benchmark Performance Testing Module

3 participants