Skip to content

Add benchmark_single_table_aws function #417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Jul 17, 2025

Conversation

R-Palazzo
Copy link
Contributor

@R-Palazzo R-Palazzo commented Jul 2, 2025

Resolve #414
CU-86b5ewpu4

@amontanez24

  • what is the expected behavior for the additional_datasets_folder parameter in benchmark_single_table_aws()? Should it be an S3 bucket, or could it be a local dataset also?
  • Just checking, let me know if the parameter order makes sense, in benchmark_single_table() output_folder comes after timeout while it's the first parameter in benchmark_single_table_aws()

@R-Palazzo R-Palazzo changed the base branch from main to issue-410-save-synthesizers July 2, 2025 13:56
Copy link

codecov bot commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 75.88933% with 61 lines in your changes missing coverage. Please review.

Project coverage is 68.74%. Comparing base (cfbd119) to head (917d2e3).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sdgym/benchmark.py 72.34% 39 Missing ⚠️
sdgym/s3.py 63.41% 15 Missing ⚠️
sdgym/result_writer.py 90.14% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #417      +/-   ##
==========================================
+ Coverage   68.56%   68.74%   +0.17%     
==========================================
  Files          20       21       +1     
  Lines        1422     1622     +200     
==========================================
+ Hits          975     1115     +140     
- Misses        447      507      +60     
Flag Coverage Δ
integration 55.11% <42.29%> (-3.33%) ⬇️
unit 57.02% <71.93%> (+2.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@R-Palazzo R-Palazzo changed the title Issue 414 benchmark single table aws Add benchmark_single_table_aws function Jul 3, 2025
@sdv-team
Copy link
Contributor

sdv-team commented Jul 3, 2025

@R-Palazzo R-Palazzo requested a review from amontanez24 July 3, 2025 16:39
@R-Palazzo R-Palazzo marked this pull request as ready for review July 3, 2025 16:43
@R-Palazzo R-Palazzo requested a review from a team as a code owner July 3, 2025 16:43
@R-Palazzo R-Palazzo force-pushed the issue-414-benchmark_single_table_aws branch from ccc4c05 to 647f17a Compare July 7, 2025 13:37
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok that output_folder is in a different place in terms of order for this function than benchmark_single_table

@R-Palazzo R-Palazzo force-pushed the issue-414-benchmark_single_table_aws branch from 647f17a to 8ebc6b7 Compare July 8, 2025 12:53
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the changes! I left one small comment but I think this is great

@R-Palazzo R-Palazzo force-pushed the issue-414-benchmark_single_table_aws branch from d4e3d1d to 6f142aa Compare July 9, 2025 15:29
@R-Palazzo R-Palazzo requested review from amontanez24 and fealho July 9, 2025 16:18
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment. This looks great! Thank you for addressing the comments

@R-Palazzo R-Palazzo force-pushed the issue-414-benchmark_single_table_aws branch from 6f142aa to 6364280 Compare July 10, 2025 14:35
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments then :shipit:

@R-Palazzo R-Palazzo force-pushed the issue-410-save-synthesizers branch from eab6ecb to 662786d Compare July 14, 2025 07:27
Base automatically changed from issue-410-save-synthesizers to main July 14, 2025 07:53
@R-Palazzo R-Palazzo force-pushed the issue-414-benchmark_single_table_aws branch from caacb63 to 36b8dea Compare July 14, 2025 12:00
@R-Palazzo R-Palazzo requested review from fealho and amontanez24 July 14, 2025 15:28
Copy link
Member

@fealho fealho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one final comment, it should be good to go after that

sdgym/s3.py Outdated
@@ -144,3 +150,52 @@ def write_csv(data, path, aws_key, aws_secret):
"""
data_contents = data.to_csv(index=False).encode('utf-8')
write_file(data_contents, path, aws_key, aws_secret)


def _parse_s3_uri(s3_uri):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this method the same as parse_s3_path which already exists?

@R-Palazzo R-Palazzo requested a review from fealho July 15, 2025 09:01
@R-Palazzo R-Palazzo force-pushed the issue-414-benchmark_single_table_aws branch from 7636acf to 917d2e3 Compare July 15, 2025 16:47
@R-Palazzo R-Palazzo merged commit f73dc02 into main Jul 17, 2025
55 checks passed
@R-Palazzo R-Palazzo deleted the issue-414-benchmark_single_table_aws branch July 17, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add benchmark_single_table_aws function
4 participants