Skip to content

Add consistent synthetic data flag #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Harshith-umesh
Copy link
Collaborator

This PR Introduces a new CLI flag --consistent-synthetic-data that ensures synthetic datasets generate the same prompts across different concurrency levels, enabling fair performance comparisons in benchmarks.

When benchmarking with synthetic data across multiple concurrency rates, different prompts were generated for each rate, making it difficult to compare performance fairly since the workload varied between tests.
Related Issue: #222

Solution

Added a new boolean flag that, when enabled, resets the synthetic data iterator for each concurrency level to ensure consistent prompt generation.

Implementation

  • Flag defaults to False to maintain backward compatibility
  • Only affects synthetic datasets; regular datasets are unaffected
  • Only resets iterator for infinite iter_type with synthetic data

Testing

Comprehensive tests to verify that consistent prompts are generated across iterations when enabled and ensure existing functionality remains unchanged.

Usage

# Enable consistent synthetic data generation
guidellm run --target http://localhost:8000 \
  --data '{"prompt_tokens": 50, "output_tokens": 25}' \
  --rate-type concurrent 
  --rate 1,2,4,8 \
  --consistent-synthetic-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant