Add consistent synthetic data flag #241
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR Introduces a new CLI flag
--consistent-synthetic-data
that ensures synthetic datasets generate the same prompts across different concurrency levels, enabling fair performance comparisons in benchmarks.When benchmarking with synthetic data across multiple concurrency rates, different prompts were generated for each rate, making it difficult to compare performance fairly since the workload varied between tests.
Related Issue: #222
Solution
Added a new boolean flag that, when enabled, resets the synthetic data iterator for each concurrency level to ensure consistent prompt generation.
Implementation
False
to maintain backward compatibilityTesting
Comprehensive tests to verify that consistent prompts are generated across iterations when enabled and ensure existing functionality remains unchanged.
Usage