Add consistent synthetic data flag #241

Harshith-umesh · 2025-07-25T22:59:09Z

This PR Introduces a new CLI flag --consistent-synthetic-data that ensures synthetic datasets generate the same prompts across different concurrency levels, enabling fair performance comparisons in benchmarks.

When benchmarking with synthetic data across multiple concurrency rates, different prompts were generated for each rate, making it difficult to compare performance fairly since the workload varied between tests.
Related Issue: #222

Solution

Added a new boolean flag that, when enabled, resets the synthetic data iterator for each concurrency level to ensure consistent prompt generation.

Implementation

Flag defaults to False to maintain backward compatibility
Only affects synthetic datasets; regular datasets are unaffected
Only resets iterator for infinite iter_type with synthetic data

Testing

Comprehensive tests to verify that consistent prompts are generated across iterations when enabled and ensure existing functionality remains unchanged.

Usage

# Enable consistent synthetic data generation
guidellm run --target http://localhost:8000 \
  --data '{"prompt_tokens": 50, "output_tokens": 25}' \
  --rate-type concurrent 
  --rate 1,2,4,8 \
  --consistent-synthetic-data

Add consistent synthetic data flag

be27cda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add consistent synthetic data flag #241

Add consistent synthetic data flag #241

Uh oh!

Harshith-umesh commented Jul 25, 2025

Uh oh!

Uh oh!

Add consistent synthetic data flag #241

Are you sure you want to change the base?

Add consistent synthetic data flag #241

Uh oh!

Conversation

Harshith-umesh commented Jul 25, 2025

Solution

Implementation

Testing

Usage

Uh oh!

Uh oh!