Skip to content

[ENH]: Parallelize fetching sparse indexes of a segment #5146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tanujnay112
Copy link
Contributor

@tanujnay112 tanujnay112 commented Jul 25, 2025

Description of changes

  • Improvements & Bug fixes
    • Parallelize sparse index fetches within a segment.
  • New functionality
    • N/A

Test plan

TBD

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@tanujnay112 tanujnay112 marked this pull request as ready for review July 25, 2025 08:42
Copy link
Contributor

propel-code-bot bot commented Jul 25, 2025

Parallelize Segment Sparse Index Fetching

This PR refactors the record and metadata segment index loading logic to fetch all sparse segment blockfiles in parallel using tokio::join, significantly improving segment load performance. To achieve this, new helper functions for asynchronous index loading were introduced, common loading logic was extracted, and BlockfileProvider's trait bounds were clarified using new ReadKey/ReadValue traits for improved type safety across async boundaries.

Key Changes

• Refactored RecordSegmentReader and MetadataSegmentReader::from_segment to load all blockfile readers in parallel (tokio::join).
• Introduced async load_index_reader helpers for DRY/fail-fast index fetching with improved error propagation.
• Added ReadKey/ReadValue traits to blockstore::provider to correct trait bounds for safe async reads.
• Updated rust/index/src/spann/types.rs to parallelize loading of posting_lists and versions_map in SpannIndexReader::from_id.
• Refactored call sites in segment and index modules to use the new parallel index fetch pattern.

Affected Areas

• rust/segment/src/blockfile_record.rs
• rust/segment/src/blockfile_metadata.rs
• rust/blockstore/src/provider.rs
• rust/index/src/spann/types.rs

This summary was automatically generated by @propel-code-bot

@@ -42,6 +42,29 @@ impl Debug for BlockfileProvider {
}
}
}
pub trait ReadKey<'a>:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty

Copy link
Contributor

blacksmith-sh bot commented Jul 28, 2025

2 Jobs Failed:

PR checks / Rust tests / Integration test ci_k8s_integration_slow 1
Step "Run tests" from job "Rust tests / Integration test ci_k8s_integration_slow 1" is failing. The last 20 log lines are:

[...]
                collection_id: CollectionUuid(
                    34d6d6fc-39d0-45e3-92c3-ddf1d5866e75,
                ),
                min_versions_to_keep: 1,
            },
        ],
        None,
    )
    	successes: 39
    	local rejects: 204
    		204 times at duplicate sparse index sampled
    	global rejects: 0
    

  Cancelling due to test failure
────────────
     Summary [ 131.973s] 1 test run: 0 passed, 1 failed, 583 skipped
        FAIL [ 131.953s] garbage_collector::prop_test_garbage_collection tests::test_k8s_integration_garbage_collection
error: test run failed
Error: Process completed with exit code 100.
PR checks / all-required-pr-checks-passed
Step "Decide whether the needed jobs succeeded or failed" from job "all-required-pr-checks-passed" is failing. The last 20 log lines are:

[...]
}
EOM
)"
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
env:
  GITHUB_REPO_NAME: chroma-core/chroma
  PYTHONPATH: /home/runner/_work/_actions/re-actors/alls-green/release/v1/src
# ❌ Some of the required to succeed jobs failed 😢😢😢

📝 Job statuses:
📝 python-tests → ✓ success [required to succeed or be skipped]
📝 python-vulnerability-scan → ✓ success [required to succeed or be skipped]
📝 javascript-client-tests → ✓ success [required to succeed or be skipped]
📝 rust-tests → ❌ failure [required to succeed or be skipped]
📝 go-tests → ✓ success [required to succeed or be skipped]
📝 lint → ✓ success [required to succeed]
📝 check-helm-version-bump → ⬜ skipped [required to succeed or be skipped]
📝 delete-helm-comment → ✓ success [required to succeed or be skipped]
Error: Process completed with exit code 1.

Summary: 1 successful workflow, 1 failed workflow

Last updated: 2025-07-28 18:06:41 UTC

bool_metadata_reader,
f32_metadata_reader,
u32_metadata_reader,
) = tokio::join!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Consider adding error handling for the case where one of the parallel operations fails but others succeed. The current implementation fails fast, but you might want to log which specific index failed to load for better debugging.

Suggested change
) = tokio::join!(
let (
pls_reader,
string_metadata_reader,
bool_metadata_reader,
f32_metadata_reader,
u32_metadata_reader,
) = tokio::join!(
pls_future,
string_metadata_future,
bool_metadata_future,
f32_metadata_future,
u32_metadata_future
);
// Handle results and create index readers with better error context
let pls_reader = pls_reader.map_err(|e| {
tracing::error!("Failed to load PLS reader: {:?}", e);
e
})?;

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Comment on lines +2654 to +2662
let (pl_result, vm_result) = tokio::join!(
Self::posting_list_reader_from_id(pl_id, blockfile_provider, prefix_path),
Self::versions_map_reader_from_id(
versions_id,
blockfile_provider,
prefix_path
)
);
(pl_result?, vm_result?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Consider using futures::try_join! here. It can make the code more concise and has the benefit of failing fast: if one of the futures fails, the others are dropped, which can save resources. tokio::join! runs all futures to completion regardless of failures.

You'll need to add use futures::try_join; at the top of the file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah +1, should we use try_join so we can early bail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point! Using futures::try_join! is indeed a best practice here. It allows for early exit if any of the parallel operations fail, which can save resources and provide clearer error handling by immediately propagating the first error. tokio::join! would continue to run all futures even if one has already failed.

I've prepared suggestions for the changes in rust/index/src/spann/types.rs, rust/segment/src/blockfile_metadata.rs, and rust/segment/src/blockfile_record.rs to incorporate try_join!. You'll also need to add use futures::try_join; to the top of each of these files.

Suggested change
let (pl_result, vm_result) = tokio::join!(
Self::posting_list_reader_from_id(pl_id, blockfile_provider, prefix_path),
Self::versions_map_reader_from_id(
versions_id,
blockfile_provider,
prefix_path
)
);
(pl_result?, vm_result?)
let (pl_result, vm_result) = futures::try_join!(
Self::posting_list_reader_from_id(pl_id, blockfile_provider, prefix_path),
Self::versions_map_reader_from_id(
versions_id,
blockfile_provider,
prefix_path
)
)?;
(pl_result, vm_result)
Suggested change
let (pl_result, vm_result) = tokio::join!(
Self::posting_list_reader_from_id(pl_id, blockfile_provider, prefix_path),
Self::versions_map_reader_from_id(
versions_id,
blockfile_provider,
prefix_path
)
);
(pl_result?, vm_result?)
let (
pls_reader,
string_metadata_reader,
bool_metadata_reader,
f32_metadata_reader,
u32_metadata_reader,
) = futures::try_join!(
pls_future,
string_metadata_future,
bool_metadata_future,
f32_metadata_future,
u32_metadata_future
)?;
Suggested change
let (pl_result, vm_result) = tokio::join!(
Self::posting_list_reader_from_id(pl_id, blockfile_provider, prefix_path),
Self::versions_map_reader_from_id(
versions_id,
blockfile_provider,
prefix_path
)
);
(pl_result?, vm_result?)
let (
max_offset_id_result,
user_id_to_id_result,
id_to_user_id_result,
id_to_data_result,
) = futures::try_join!(
max_offset_id_future,
user_id_to_id_future,
id_to_user_id_future,
id_to_data_future
)?;

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Comment on lines +765 to +780
let (
max_offset_id_result,
user_id_to_id_result,
id_to_user_id_result,
id_to_data_result,
) = tokio::join!(
max_offset_id_future,
user_id_to_id_future,
id_to_user_id_future,
id_to_data_future
);

if id_to_user_id_segment_path != max_offset_segment_path {
let max_offset_id_bf_reader = max_offset_id_result?;
let user_id_to_id = user_id_to_id_result?;
let id_to_user_id = id_to_user_id_result?;
let id_to_data = id_to_data_result?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Consider using futures::try_join! here. It can make the code more concise and has the benefit of failing fast: if one of the futures fails, the others are dropped, which can save resources. tokio::join! runs all futures to completion regardless of failures.

You'll need to add use futures::try_join; at the top of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants