Skip to content

[ENH]: Load HNSW index without disk intermediary #5159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tanujnay112
Copy link
Contributor

@tanujnay112 tanujnay112 commented Jul 29, 2025

Description of changes

(WIP)

  • Improvements & Bug fixes
    • Loads the HNSW index from S3 without a disk intermediary by directly passing in the memory buffer given by S3.
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@tanujnay112 tanujnay112 marked this pull request as ready for review July 29, 2025 14:20
Copy link
Contributor

Enable In-Memory Loading of HNSW Index from S3 (No Disk Intermediary)

This PR refactors the HNSW index loading pipeline to support direct in-memory loading of index data from S3, rather than writing and reading intermediate files to/from disk. It updates interface contracts, adds a new code path for loading buffers directly, introduces relevant changes throughout the index provider and index types, and bumps the hnswlib dependency to a specific commit supporting this feature.

Key Changes

• Added PersistentIndex::load_from_hnsw_data API (with trait and implementations) to allow direct in-memory index loading.
• Modified hnsw_provider.rs to use memory buffers from S3 instead of temporary disk files for index loading in both open() and fork() paths.
• Refactored the segment loading logic to assemble HNSW file buffers into a new hnswlib::HnswData in memory.
• Updated hnsw.rs for new index loading API support.
• Bumped 'hnswlib' dependency to a commit supporting in-memory loading (rev: fb6c1f7) and updated Cargo.toml/Cargo.lock accordingly.

Affected Areas

• rust/index/src/hnsw_provider.rs
• rust/index/src/hnsw.rs
• rust/index/src/types.rs
• Cargo.toml
• Cargo.lock

This summary was automatically generated by @propel-code-bot

Copy link
Contributor

blacksmith-sh bot commented Jul 29, 2025

8 Jobs Failed:

PR checks / Lint
Step "Clippy" from job "Lint" is failing. The last 20 log lines are:

[...]
   Compiling quote v0.6.13
   Compiling matrixmultiply v0.3.9
   Compiling tracing-test-macro v0.2.5
    Checking rawpointer v0.2.1
    Checking tracing-test v0.2.5
    Checking tokio-test v0.4.4
    Checking approx v0.5.1
    Checking ndarray v0.16.1
   Compiling convert_case v0.6.0
   Compiling proptest-derive v0.3.0
   Compiling napi-build v2.1.6
   Compiling chromadb-js-bindings v0.1.0 (/home/runner/_work/chroma/chroma/rust/js_bindings)
   Compiling napi-derive-backend v1.0.75
   Compiling ctor v0.2.9
    Checking napi-sys v2.4.0
    Checking napi v2.16.17
   Compiling napi-derive v2.16.13
error: could not compile `chroma-index` (lib test) due to 3 previous errors
    Checking chroma v0.1.0 (/home/runner/_work/chroma/chroma/rust/chroma)
Error: Process completed with exit code 101.
PR checks / Rust tests / test-benches (blacksmith-16vcpu-ubuntu-2204, --bench get)
Step "Run benchmark" from job "Rust tests / test-benches (blacksmith-16vcpu-ubuntu-2204, --bench get)" is failing. The last 20 log lines are:

[...]
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/mod.rs:87:22
  13: tokio::runtime::context::runtime::enter_runtime
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/context/runtime.rs:65:16
  14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
  15: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/runtime.rs:370:50
  16: tokio::runtime::runtime::Runtime::block_on
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/runtime.rs:340:13
  17: get::bench_get
             at ./benches/get.rs:125:25
  18: get::benches
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/criterion-0.5.1/src/macros.rs:71:17
  19: get::main
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/criterion-0.5.1/src/macros.rs:124:17
  20: core::ops::function::FnOnce::call_once
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: bench failed, to rerun pass `-p worker --bench get`
Error: Process completed with exit code 101.
PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_embeddings.py)
Step "Test" from job "Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_embeddings.py)" is failing. The last 20 log lines are:

[...]
        /home/runner/_work/chroma/chroma/chromadb/test/property/test_embeddings.py:384
        /home/runner/_work/chroma/chroma/chromadb/test/property/test_embeddings.py:384
        /home/runner/_work/chroma/chroma/chromadb/test/property/test_embeddings.py:387
        /home/runner/_work/chroma/chroma/chromadb/test/property/test_embeddings.py:387
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:10
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:10
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:23
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:23
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:27
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:27
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:9
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:9


You can reproduce this example by temporarily adding @reproduce_failure('6.112.2', b'AXicXZhZiKtnGcffJ8lkMpklk0xmMpNZMlkmy0wmsyWTmcwkqUu9EFwOHkFpsaBolVbaYxVcOUfwqsUFWsEjvfDGG6UiiHdFeqxSvRFEhKKoIBRviqgXFr2w/v7Pl7M1Icn7vcuz/J/1TQghZaFvwSzkLOQtLDMO4SiEB+xZjULIJVitBr6GtmF5O7ZTO7E9O2RnyYq2H+xLFm4E7Uia2Yp1LWsNRlneBauEOONiiIXJK/P2B8zg4/OpYJ8Tm3//zaDc4sSuxWwAsZq1mTm0KYvbDKNlKGc4cWoL8BmZxDqIaFvFmpxYD+I+sEvGB7ZleZ4H7EzaKvIGy/nuA6sHe9wlQchesGnxf+YrZkeWsg4SFyzt0t5K/fPxa8Xf/kvouHr6TRgypYOz2kWYWZviYYhoWVuyHVuzMWtZJ/DV0deuv8nLmG+yp2rriJqHzj6qntqlALAdzm9KEGCtgTQn7e7JBSim7dzazAr0LOrKCBecyXFGKpUQeYxi81DtsafninbgemFlIGj78xr0c5xeYpwBOlcn3HwaSKHd53kVPknGAGRzcBtAbQsAL+G6B+DxIHmMuSygTvPOsFtUD909crYV7EOiur0uoDM2x3oHYAesNdhZxpQr4LBm28hWxahn/BY43UKuNehI+1tPRp4i3DY4N412sQiX4eqzN/+e+GxwN43dtghsAbaLCADov3E3iXyp6dAW+cxDqgiYRZTosbrGyGDftXhwZ5LNbkzd1Os7BtfF4AtjBF8BEPl14NyJwLoeAhb6n2HStAMzHxQlIcx9mAe8InK0d2qqcqDlAGuJ0gSRTZ52hA8kp5jrgU/X0VzGN7oosWMJd+h92A1d9AyY7zKeRUl81rqyGv4jv6lLzkuONDhQA8qODd8ScRe8Y4ENTPz5p4ZZdzmU5VSd8Tqi8WJDwhJxpBfBmtuv5z5XwjOMxTG8T/ns88m5ZySRjlBLhdB98fffEqOxPAKfLHN2HlMsYV0ZpQ3kBdjJejJHGipxO4HHLHtO+O6zc9nxuITCOZ5Ww0PW8J9ZTLmC9yzDvY8WJfYsTHx4+DuzPai0me3DdRbvOjBF9i5jyZl3WsvEjWImi0cqThYi++Zt5AGy7FmjbCVH7qXZb7zWmb/+pvJLEfdfQtgz2/adAQ/KuNX3EemEUFRm2uH0IQpVwTPD04Ecuy8BP01inIVVEsZLIQq4AcJgZp5iTmkLYLbw5MDpkgdEFyEP5F9+rhXsUzLRj8gfyomitgmNOQCMAlOA4XIX2nW1q4Bc8ZWGh87EE8bSuOvhrCxZgI+CsQF4grOO+RpyvPdZqP7yh+/ljBLtNFsl6hkgjRGwCfN9zyFDZkbKkIC0DLl5ZhYgoqyGqebZWhY3JrsR3AW+Ku71eyhSAKpNWA6wyoYLeuON154Iz/9k3jfM4qVeLMLbHiQSHf0ms1XoqXaE8ODXn3tEMbruafDId1Qgc2HnjmIGjGqIE/fEJQfbIjYiLBe3pXGF9YCsg2DvcpQSUd1IwSPmKd7lxmqycRl/GYP/yBP6qSNc9Siq8j4BmBg+kWHHOu4Hnnfj0FNafOJB51DZgPqQ3RXk6ngNy3mK6EyicsFWA+Jy7um+F7MqqBdZEmJN8nUDIiVYpRGrghmzHGliLdyTzxKsJgZsTwCXzIfCFHxqEFzABH3wUlCsoeklpC/Q8AgmuxAqg1cKERrs67FzhRN5TwsEhAyR4sjQFlk6QbZ9xiM21PEU4hN1ag7Qun+3WS97yLS0guxKwRl3wY37Sq6CsgGzCpo1J+7dVFj0pNAfPioAVQtPokoXZPEQXvmAoKuwGL8HcqmfQc4ZQlDpegvEthnlkP7QVa0iyT7PG15PS0gWB6oCUJwx32D2HPUvvPHJ4cp5oGm46apAqFwyh9bnntl6nD0UPGu8jxkcR95Tg/Oe6/jy4W1XyPo5wF/T80cOxH0TqdrI2pTdkdhTUfEH/hIqZc6gx+ctXP1kbUW5f8YrQYcomcEWssSFkgUcFXjToFv0iFsEQ9Xigq8V4JWKug1v3qIGZ80px379s78otRVU1J4KofWLL/xRgYvKZSK6Jnn57nhKKALJDK5XjqJ7gy8e2JqF9x4B6Sf10/DCeKn+CmwSQKdqegSUVRTbAKd7C/Ad2yVUHTxBbuNN5t7f4kiJQEuD3yFP0j/tHcw08ys8tdgl1w48TWPHgEyqSXnm9+iI5FEjmK94Gt6cdCApbxS2sZXqxRhM6dw8aFWvqvBro/DYi+0OGkq+RY+6pEuo/qvs6F94pTnGPk1SgCzb9risg8wckKkjklabNuPUssjdAcgFgFCdu/Dquu45N+H92CErfU9qkqMDPcXQNusj7796WNPcx3fQMKbUQQGQJFNoso0uLegWXeITPkKyrdN+Ysr7kTOwpiDBwY25qZzmuXQFpYLXEjW/X7z682fUBx97Fesq6wZHMI61FuVhRTfff9RUU6CYHTmvPtsvkaXizYdKzbai0MPlCPySFECvAa8/fO3V0q/iqjw7nNzi1BI7Mh7v69Dc8PSVdkvLnxNQKvjFIWaigDJqlDa5n8hXp5TzdeeQDog44N3yIFjwuRJCVjy/JZktOvkeOyimohQ8ezXZNGa6Dr4Nyk/craOnY0e35XlHebqAbegAPaFn5V/EqOYr6Ck6PF1xhL6seiidcn51GsKBbg4hyKcTFL/9nHh3PV8qdHL3xWyQHY7Ulr0llJVCdH8auph1XCH4vW3slasMQQIq+PFZpk/JaRVifkxzq8agDtbyi6KC09PY4mSu6DV+VbniIQujN54YqvleAbJ99g69FapNpLzbRA1Vi8zbt3KU6u80+fuwSYp03ZuU/0Z13pcSUN7AB7glwVFlV4171T0tTnq9P59mvFD1QX7EPrUDM4yJMPxljI7LeEvDr63K5GdApkZuXjnB42eE3Avs3MVS0+xMMNfVPcdx7nn6WsaCe4rLEwtPffO7jwXdeinOH3MZ3u/RuahI/IxPpBRNOVScA9MEva1886U7zYAcbxnPLkAzSn+v/On5j0drS0qkAuph5aqabkJuykX2K+7Pb1fGgAI6csVlKapkeTjJNVUk5xklqf/aPEWwyd1UKGnf3q1zyatB7tKG8hl+6bSa5lFXYi4GBrtg251cHv76qmRIQDyNXSp3bq81Ux5reze/4xW27Fem4yii79yrmuQwd9XkJ/R61Ny/TtXOqTpxatfvRwm/3ZA1gsdn1a+/S56YLvy/gl3PUKosee78pkT9Qcn+nhdc0SndUaJ7eP0f33vyN9+/pZ5lD036fr84gM+O3xoG3resKmg8Hlqu+SHPGHZdOr/jMY+hltR6VBMvvuAdxLmLpYSyKYcIdk2Lx6/LEFW//0YXi+jaUJn0Kxnv0XaQYxYzdJSrGTV49xh1lEUDbhhqibMrfsFhf99v5fNki/uDZ8CpShRbUw70XQfCKA9p9HJeoirtLOIdM7C+QMFTlNaFU3kBzNMolwaRsiqMU3/kx8E7n3rks3ciOQf4gr/j/0ik0EbfDeTOQHTsl/Ez/F24jhl1vJfZxx6Xjuum/rbx7kf14Cyifk+HozvumM8AJ15x72sjVckb8by6MnU77itRXE6DbxnaPf/DZxm/GcKxivr3x1opQkmZ/oBQnxNs0dr/AYtGbyw=') as a decorator on your test case
You can reproduce this example by temporarily adding @reproduce_failure('6.112.2', b'AXicXZhZiKtnGcffJ8lkMpklk0xmMpNZMlkmy0wmsyWTmcwkqUu9EFwOHkFpsaBolVbaYxVcOUfwqsUFWsEjvfDGG6UiiHdFeqxSvRFEhKKoIBRviqgXFr2w/v7Pl7M1Icn7vcuz/J/1TQghZaFvwSzkLOQtLDMO4SiEB+xZjULIJVitBr6GtmF5O7ZTO7E9O2RnyYq2H+xLFm4E7Uia2Yp1LWsNRlneBauEOONiiIXJK/P2B8zg4/OpYJ8Tm3//zaDc4sSuxWwAsZq1mTm0KYvbDKNlKGc4cWoL8BmZxDqIaFvFmpxYD+I+sEvGB7ZleZ4H7EzaKvIGy/nuA6sHe9wlQchesGnxf+YrZkeWsg4SFyzt0t5K/fPxa8Xf/kvouHr6TRgypYOz2kWYWZviYYhoWVuyHVuzMWtZJ/DV0deuv8nLmG+yp2rriJqHzj6qntqlALAdzm9KEGCtgTQn7e7JBSim7dzazAr0LOrKCBecyXFGKpUQeYxi81DtsafninbgemFlIGj78xr0c5xeYpwBOlcn3HwaSKHd53kVPknGAGRzcBtAbQsAL+G6B+DxIHmMuSygTvPOsFtUD909crYV7EOiur0uoDM2x3oHYAesNdhZxpQr4LBm28hWxahn/BY43UKuNehI+1tPRp4i3DY4N412sQiX4eqzN/+e+GxwN43dtghsAbaLCADov3E3iXyp6dAW+cxDqgiYRZTosbrGyGDftXhwZ5LNbkzd1Os7BtfF4AtjBF8BEPl14NyJwLoeAhb6n2HStAMzHxQlIcx9mAe8InK0d2qqcqDlAGuJ0gSRTZ52hA8kp5jrgU/X0VzGN7oosWMJd+h92A1d9AyY7zKeRUl81rqyGv4jv6lLzkuONDhQA8qODd8ScRe8Y4ENTPz5p4ZZdzmU5VSd8Tqi8WJDwhJxpBfBmtuv5z5XwjOMxTG8T/ns88m5ZySRjlBLhdB98fffEqOxPAKfLHN2HlMsYV0ZpQ3kBdjJejJHGipxO4HHLHtO+O6zc9nxuITCOZ5Ww0PW8J9ZTLmC9yzDvY8WJfYsTHx4+DuzPai0me3DdRbvOjBF9i5jyZl3WsvEjWImi0cqThYi++Zt5AGy7FmjbCVH7qXZb7zWmb/+pvJLEfdfQtgz2/adAQ/KuNX3EemEUFRm2uH0IQpVwTPD04Ecuy8BP01inIVVEsZLIQq4AcJgZp5iTmkLYLbw5MDpkgdEFyEP5F9+rhXsUzLRj8gfyomitgmNOQCMAlOA4XIX2nW1q4Bc8ZWGh87EE8bSuOvhrCxZgI+CsQF4grOO+RpyvPdZqP7yh+/ljBLtNFsl6hkgjRGwCfN9zyFDZkbKkIC0DLl5ZhYgoqyGqebZWhY3JrsR3AW+Ku71eyhSAKpNWA6wyoYLeuON154Iz/9k3jfM4qVeLMLbHiQSHf0ms1XoqXaE8ODXn3tEMbruafDId1Qgc2HnjmIGjGqIE/fEJQfbIjYiLBe3pXGF9YCsg2DvcpQSUd1IwSPmKd7lxmqycRl/GYP/yBP6qSNc9Siq8j4BmBg+kWHHOu4Hnnfj0FNafOJB51DZgPqQ3RXk6ngNy3mK6EyicsFWA+Jy7um+F7MqqBdZEmJN8nUDIiVYpRGrghmzHGliLdyTzxKsJgZsTwCXzIfCFHxqEFzABH3wUlCsoeklpC/Q8AgmuxAqg1cKERrs67FzhRN5TwsEhAyR4sjQFlk6QbZ9xiM21PEU4hN1ag7Qun+3WS97yLS0guxKwRl3wY37Sq6CsgGzCpo1J+7dVFj0pNAfPioAVQtPokoXZPEQXvmAoKuwGL8HcqmfQc4ZQlDpegvEthnlkP7QVa0iyT7PG15PS0gWB6oCUJwx32D2HPUvvPHJ4cp5oGm46apAqFwyh9bnntl6nD0UPGu8jxkcR95Tg/Oe6/jy4W1XyPo5wF/T80cOxH0TqdrI2pTdkdhTUfEH/hIqZc6gx+ctXP1kbUW5f8YrQYcomcEWssSFkgUcFXjToFv0iFsEQ9Xigq8V4JWKug1v3qIGZ80px379s78otRVU1J4KofWLL/xRgYvKZSK6Jnn57nhKKALJDK5XjqJ7gy8e2JqF9x4B6Sf10/DCeKn+CmwSQKdqegSUVRTbAKd7C/Ad2yVUHTxBbuNN5t7f4kiJQEuD3yFP0j/tHcw08ys8tdgl1w48TWPHgEyqSXnm9+iI5FEjmK94Gt6cdCApbxS2sZXqxRhM6dw8aFWvqvBro/DYi+0OGkq+RY+6pEuo/qvs6F94pTnGPk1SgCzb9risg8wckKkjklabNuPUssjdAcgFgFCdu/Dquu45N+H92CErfU9qkqMDPcXQNusj7796WNPcx3fQMKbUQQGQJFNoso0uLegWXeITPkKyrdN+Ysr7kTOwpiDBwY25qZzmuXQFpYLXEjW/X7z682fUBx97Fesq6wZHMI61FuVhRTfff9RUU6CYHTmvPtsvkaXizYdKzbai0MPlCPySFECvAa8/fO3V0q/iqjw7nNzi1BI7Mh7v69Dc8PSVdkvLnxNQKvjFIWaigDJqlDa5n8hXp5TzdeeQDog44N3yIFjwuRJCVjy/JZktOvkeOyimohQ8ezXZNGa6Dr4Nyk/craOnY0e35XlHebqAbegAPaFn5V/EqOYr6Ck6PF1xhL6seiidcn51GsKBbg4hyKcTFL/9nHh3PV8qdHL3xWyQHY7Ulr0llJVCdH8auph1XCH4vW3slasMQQIq+PFZpk/JaRVifkxzq8agDtbyi6KC09PY4mSu6DV+VbniIQujN54YqvleAbJ99g69FapNpLzbRA1Vi8zbt3KU6u80+fuwSYp03ZuU/0Z13pcSUN7AB7glwVFlV4171T0tTnq9P59mvFD1QX7EPrUDM4yJMPxljI7LeEvDr63K5GdApkZuXjnB42eE3Avs3MVS0+xMMNfVPcdx7nn6WsaCe4rLEwtPffO7jwXdeinOH3MZ3u/RuahI/IxPpBRNOVScA9MEva1886U7zYAcbxnPLkAzSn+v/On5j0drS0qkAuph5aqabkJuykX2K+7Pb1fGgAI6csVlKapkeTjJNVUk5xklqf/aPEWwyd1UKGnf3q1zyatB7tKG8hl+6bSa5lFXYi4GBrtg251cHv76qmRIQDyNXSp3bq81Ux5reze/4xW27Fem4yii79yrmuQwd9XkJ/R61Ny/TtXOqTpxatfvRwm/3ZA1gsdn1a+/S56YLvy/gl3PUKosee78pkT9Qcn+nhdc0SndUaJ7eP0f33vyN9+/pZ5lD036fr84gM+O3xoG3resKmg8Hlqu+SHPGHZdOr/jMY+hltR6VBMvvuAdxLmLpYSyKYcIdk2Lx6/LEFW//0YXi+jaUJn0Kxnv0XaQYxYzdJSrGTV49xh1lEUDbhhqibMrfsFhf99v5fNki/uDZ8CpShRbUw70XQfCKA9p9HJeoirtLOIdM7C+QMFTlNaFU3kBzNMolwaRsiqMU3/kx8E7n3rks3ciOQf4gr/j/0ik0EbfDeTOQHTsl/Ez/F24jhl1vJfZxx6Xjuum/rbx7kf14Cyifk+HozvumM8AJ15x72sjVckb8by6MnU77itRXE6DbxnaPf/DZxm/GcKxivr3x1opQkmZ/oBQnxNs0dr/AYtGbyw=') as a decorator on your test case
====== 3 failed, 20 passed, 1 xpassed, 116 warnings in 2727.78s (0:45:27) ======
====== 3 failed, 20 passed, 1 xpassed, 116 warnings in 2727.78s (0:45:27) ======
Error: Process completed with exit code 1.
Error: Process completed with exit code 1.
PR checks / Rust tests / Integration test ci_k8s_integration 2
Step "Run tests" from job "Rust tests / Integration test ci_k8s_integration 2" is failing. The last 20 log lines are:

[...]

    failures:

    failures:
        garbage_collector_component::tests::test_k8s_integration_ignores_forked_collections

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 37 filtered out; finished in 24.20s
    
  stderr ───
    thread 'garbage_collector_component::tests::test_k8s_integration_ignores_forked_collections' panicked at rust/garbage_collector/src/garbage_collector_component.rs:684:10:
    called `Result::unwrap()` on an `Err` value: "Timeout waiting for new version to be created"
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

  Cancelling due to test failure
────────────
     Summary [ 312.575s] 16/40 tests run: 15 passed (1 slow), 1 failed, 545 skipped
        FAIL [  24.228s] garbage_collector garbage_collector_component::tests::test_k8s_integration_ignores_forked_collections
warning: 24/40 tests were not run due to test failure (run with --no-fail-fast to run all tests, or run with --max-fail)
error: test run failed
Error: Process completed with exit code 100.
PR checks / Rust tests / test (blacksmith-8vcpu-ubuntu-2204)
Step "Test" from job "Rust tests / test (blacksmith-8vcpu-ubuntu-2204)" is failing. The last 20 log lines are:

[...]
      28: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
      29: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

  Cancelling due to test failure: 7 tests still running
        PASS [   6.223s] chroma-segment blockfile_metadata::test::test_simple_regex
        PASS [  10.005s] chroma-load tests::end_to_end
        PASS [   9.040s] chroma-segment blockfile_metadata::test::test_composite_regex
        PASS [  14.028s] chroma-log sqlite_log::tests::test_push_pull_logs
        PASS [  18.691s] chroma-blockstore::blockfile_writer_test tests::blockfile_writer_test
        PASS [  29.491s] chroma-frontend::test_collection test_collection_sqlite
        PASS [  40.020s] chroma-blockstore arrow::concurrency_test::tests::test_blockfile_shuttle
────────────
     Summary [  40.575s] 289/508 tests run: 288 passed, 1 failed, 77 skipped
        FAIL [   0.250s] chroma-segment distributed_spann::test::test_spann_segment_writer
warning: 219/508 tests were not run due to test failure (run with --no-fail-fast to run all tests, or run with --max-fail)
error: test run failed
Error: Process completed with exit code 100.
PR checks / Rust tests / test-long
Step "Test" from job "Rust tests / test-long" is failing. The last 20 log lines are:

[...]
      18: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
      19: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

        SLOW [> 60.000s] chroma-index spann::types::tests::test_long_running_data_integrity_parallel
        SLOW [> 60.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        PASS [  60.467s] chroma-index spann::types::tests::test_long_running_data_integrity_parallel
        SLOW [>120.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        SLOW [>180.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        SLOW [>240.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        PASS [ 240.114s] chroma-index spann::types::tests::test_long_running_data_integrity
────────────
     Summary [ 240.115s] 5 tests run: 2 passed (2 slow), 3 failed, 580 skipped
        FAIL [   3.636s] chroma-index spann::types::tests::test_long_running_data_integrity_multiple_parallel_runs
        FAIL [   3.321s] chroma-index spann::types::tests::test_long_running_data_integrity_multiple_parallel_runs_with_updates_deletes
        FAIL [   3.195s] chroma-index spann::types::tests::test_long_running_integrity_multiple_runs
error: test run failed
Error: Process completed with exit code 100.
PR checks / all-required-pr-checks-passed
Step "Decide whether the needed jobs succeeded or failed" from job "all-required-pr-checks-passed" is failing. The last 20 log lines are:

[...]
}
EOM
)"
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
env:
  GITHUB_REPO_NAME: chroma-core/chroma
  PYTHONPATH: /home/runner/_work/_actions/re-actors/alls-green/release/v1/src
# ❌ Some of the required to succeed jobs failed 😢😢😢

📝 Job statuses:
📝 python-tests → ❌ failure [required to succeed or be skipped]
📝 python-vulnerability-scan → ✓ success [required to succeed or be skipped]
📝 javascript-client-tests → ✓ success [required to succeed or be skipped]
📝 rust-tests → ❌ failure [required to succeed or be skipped]
📝 go-tests → ✓ success [required to succeed or be skipped]
📝 lint → ❌ failure [required to succeed]
📝 check-helm-version-bump → ⬜ skipped [required to succeed or be skipped]
📝 delete-helm-comment → ✓ success [required to succeed or be skipped]
Error: Process completed with exit code 1.

1 job failed running on non-Blacksmith runners.


Summary: 1 successful workflow, 1 failed workflow

Last updated: 2025-07-29 15:10:29 UTC

Comment on lines +223 to +230
None => match HnswIndex::load_from_hnsw_data(
self.fetch_hnsw_segment(&new_id, prefix_path)
.await
.map_err(|e| Box::new(HnswIndexProviderForkError::FileError(*e)))?,
&index_config,
ef_search,
new_id,
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The logic for loading the index within the fork method appears to be incorrect. It attempts to fetch the segment from remote storage using new_id, but the index for new_id doesn't exist in storage yet. The files for source_id have just been copied to a local directory.

The previous implementation using HnswIndex::load(storage_path_str, ...) correctly loaded the index from this new local directory. Since fork is intended to create a mutable, file-backed copy of an index, it seems the original approach of loading from the local path should be restored.

Comment on lines +287 to +301
let hnsw_data = self.fetch_hnsw_segment(source_id, prefix_path).await?;
let getters = [
|hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.header_buffer())),
|hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.data_level0_buffer())),
|hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.length_buffer())),
|hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.link_list_buffer())),
];

for (file, getter) in FILES.iter().zip(getters) {
let file_path = index_storage_path.join(file);
self.copy_bytes_to_local_file(&file_path, getter(&hnsw_data))
.await?;
}
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PerformanceOptimization]

This function now fetches the entire HNSW segment into an in-memory HnswData object before writing the individual files to disk. The previous implementation streamed each file directly. For large indexes, this change could significantly increase memory usage during the fork operation. Was this change intentional? If fork still needs to write to disk, perhaps restoring the previous file-by-file download logic for this function would be more memory-efficient.

Comment on lines +408 to +415
None => match HnswIndex::load_from_hnsw_data(
self.fetch_hnsw_segment(id, prefix_path)
.await
.map_err(|e| Box::new(HnswIndexProviderOpenError::FileError(*e)))?,
&index_config,
ef_search,
*id,
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PerformanceOptimization]

This change successfully loads the index from memory, which aligns with the PR's goal. However, the open function still contains calls that write the index to a temporary directory on disk (create_dir_all, load_hnsw_segment_into_directory) before this memory-based loading occurs. These disk operations now seem redundant.

To fully load without a disk intermediary and improve efficiency, you could remove the calls to create_dir_all, load_hnsw_segment_into_directory, and purge_one_id from this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant