Skip to content

Conversation

imp
Copy link
Collaborator

@imp imp commented Jun 10, 2024

No description provided.

Copy link
Contributor

@AnIrishDuck AnIrishDuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so some thoughts on testing:

  • We should have unit tests in the arrow segment. These should be fairly detailed (e.g. test that the focus works, including with struct arrays)
  • We should have a unit test for the cache-only case.
  • We should add a simple high-level integration test (see server/tests for examples). All we need to verify here IMO is that the data focus when passed to the server properly filters the data. This is tricky because the filter is currently additionally applied at the http handler layer. We can't remove that (yet) because of the column size filtering. But, may be worth temporarily disabling that code somehow to verify that the filter makes it all the way from HTTP to the segment layer? There may be a better way to get at what we really want to test like cfg directives.
  • As far as manual testing goes, the best way is probably to use the CV benchmark as a dataset generator. See the cv binary in server/bench. This runs for a configurable period of time and generates a CV-like load. You could re-use the resulting data directory with the main plateau server, and then run a query on that (e.g. focus just the time column).

.fields
.iter()
.enumerate()
.filter(|(_, field)| !exclude.contains(&field.name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works with nested datasets (e.g. StructArray)? Do we know how those are represented in these files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost sure you right, this is something I need to figure out. I planned to start building tests of various complexity and see where it fails.

.get_records(next_index, RowLimit::records(2), Ordering::Forward)
.get_records(
next_index,
RowLimit::records(2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to think we need to package up these three (limits, ordering, focus) in some kind of higher-level struct (maybe Query? Don't love that name...)

That way we could default / use the builder pattern to avoid the repetition throughout all these test modifications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 How about DataSelector ?

@imp imp force-pushed the imp/datasets branch 3 times, most recently from 97f9e87 to b64cf57 Compare June 12, 2024 15:23
@imp
Copy link
Collaborator Author

imp commented Jun 26, 2024

@AnIrishDuck Please review

Copy link
Contributor

@AnIrishDuck AnIrishDuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially with the recent regression involving focus and nested datasets (see #23), I'd strongly suggest having some tests in src/segment/test.rs to verify that works properly. You could probably even re-use the nested schema (inferences_large_extension) from that work.

@imp imp force-pushed the imp/datasets branch 3 times, most recently from 6cba685 to f82f7d5 Compare July 1, 2024 12:43
@AnIrishDuck
Copy link
Contributor

AnIrishDuck commented Jul 11, 2024

Aside from my comment above, I've looked at this a couple times at this point and I think it's good to go once that is addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants