Imp/datasets #16

imp · 2024-06-10T14:11:37Z

No description provided.

AnIrishDuck

Ok, so some thoughts on testing:

We should have unit tests in the arrow segment. These should be fairly detailed (e.g. test that the focus works, including with struct arrays)
We should have a unit test for the cache-only case.
We should add a simple high-level integration test (see server/tests for examples). All we need to verify here IMO is that the data focus when passed to the server properly filters the data. This is tricky because the filter is currently additionally applied at the http handler layer. We can't remove that (yet) because of the column size filtering. But, may be worth temporarily disabling that code somehow to verify that the filter makes it all the way from HTTP to the segment layer? There may be a better way to get at what we really want to test like cfg directives.
As far as manual testing goes, the best way is probably to use the CV benchmark as a dataset generator. See the cv binary in server/bench. This runs for a configurable period of time and generates a CV-like load. You could re-use the resulting data directory with the main plateau server, and then run a query on that (e.g. focus just the time column).

AnIrishDuck · 2024-06-10T19:39:33Z

server/src/segment/arrow.rs

+                    .fields
+                    .iter()
+                    .enumerate()
+                    .filter(|(_, field)| !exclude.contains(&field.name))


I don't think this works with nested datasets (e.g. StructArray)? Do we know how those are represented in these files?

Almost sure you right, this is something I need to figure out. I planned to start building tests of various complexity and see where it fails.

AnIrishDuck · 2024-06-10T19:40:51Z

server/src/partition.rs

-            .get_records(next_index, RowLimit::records(2), Ordering::Forward)
+            .get_records(
+                next_index,
+                RowLimit::records(2),


Starting to think we need to package up these three (limits, ordering, focus) in some kind of higher-level struct (maybe Query? Don't love that name...)

That way we could default / use the builder pattern to avoid the repetition throughout all these test modifications.

💯 How about DataSelector ?

server/src/topic.rs

server/src/segment.rs

imp · 2024-06-26T14:53:51Z

@AnIrishDuck Please review

AnIrishDuck

Especially with the recent regression involving focus and nested datasets (see #23), I'd strongly suggest having some tests in src/segment/test.rs to verify that works properly. You could probably even re-use the nested schema (inferences_large_extension) from that work.

AnIrishDuck · 2024-07-11T18:06:45Z

Aside from my comment above, I've looked at this a couple times at this point and I think it's good to go once that is addressed.

…t for recovery

AnIrishDuck reviewed Jun 10, 2024

View reviewed changes

imp force-pushed the imp/datasets branch 3 times, most recently from 97f9e87 to b64cf57 Compare June 12, 2024 15:23

imp force-pushed the imp/datasets branch from d0a7c55 to 43446ff Compare June 25, 2024 13:00

AnIrishDuck reviewed Jun 28, 2024

View reviewed changes

imp force-pushed the imp/datasets branch 3 times, most recently from 6cba685 to f82f7d5 Compare July 1, 2024 12:43

imp added 12 commits July 18, 2024 14:54

Pass focus to get_records*

336f797

Compute projection from focus when reading segment

52fa3b4

Pass focus down to Segment

6b5ced4

Pass focus to cache

93fe110

Add DataFocus ctors

a789dda

plateau_test::default helper

08e35bb

Move tests into a separate file to reduce build time

f290cd3

Don't pass focus to cache reader - it should be able to read all of i…

f9753f1

…t for recovery

Fix arrow recovered file extension

0632713

Add tests with cache

7f913fa

Rearrange tests

9aba42d

Use rstest to parametrize UT

912aacc

imp force-pushed the imp/datasets branch from f82f7d5 to 912aacc Compare July 18, 2024 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Imp/datasets #16

Imp/datasets #16

Uh oh!

imp commented Jun 10, 2024

Uh oh!

AnIrishDuck left a comment •

edited

Loading

Uh oh!

AnIrishDuck Jun 10, 2024

Uh oh!

imp Jun 10, 2024

Uh oh!

AnIrishDuck Jun 10, 2024

Uh oh!

imp Jun 10, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imp commented Jun 26, 2024

Uh oh!

AnIrishDuck left a comment

Uh oh!

AnIrishDuck commented Jul 11, 2024 •

edited

Loading

Uh oh!

Uh oh!

Imp/datasets #16

Are you sure you want to change the base?

Imp/datasets #16

Uh oh!

Conversation

imp commented Jun 10, 2024

Uh oh!

AnIrishDuck left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AnIrishDuck Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

imp Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

AnIrishDuck Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

imp Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imp commented Jun 26, 2024

Uh oh!

AnIrishDuck left a comment

Choose a reason for hiding this comment

Uh oh!

AnIrishDuck commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AnIrishDuck left a comment •

edited

Loading

AnIrishDuck commented Jul 11, 2024 •

edited

Loading