feat: Add vector_db_id to chunk metadata #3255

are-ces · 2025-08-26T11:27:33Z

What does this PR do?

When running RAG in a multi vector DB setting, it can be difficult to trace where retrieved chunks originate from. This PR adds the vector_db_id into each chunk’s metadata, making it easier to understand which database a given chunk came from. This is helpful for debugging and for analyzing retrieval behavior of multiple DBs.

Relevant code:

for vector_db_id, result in zip(vector_db_ids, results):
    for chunk, score in zip(result.chunks, result.scores):
        if not hasattr(chunk, "metadata") or chunk.metadata is None:
            chunk.metadata = {}
        chunk.metadata["vector_db_id"] = vector_db_id

        chunks.append(chunk)
        scores.append(score)

Test Plan

Ran Llama Stack in debug mode.
Verified that vector_db_id was added to each chunk’s metadata.
Confirmed that the metadata was printed in the console when using the RAG tool.

meta-cla · 2025-08-26T11:27:39Z

Hi @are-ces!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

franciscojavierarceo · 2025-08-26T12:43:44Z

llama_stack/providers/inline/tool_runtime/rag/memory.py

@@ -131,8 +131,15 @@ async def query(
            for vector_db_id in vector_db_ids
        ]
        results: list[QueryChunksResponse] = await asyncio.gather(*tasks)
-        chunks = [c for r in results for c in r.chunks]
-        scores = [s for r in results for s in r.scores]
+


can you add a unit test for this with multiple vector DBs to confirm this behavior will work?

Unit test added thank you!

franciscojavierarceo

thanks for this, requesting an added test.

tests/unit/rag/test_rag_query.py

franciscojavierarceo · 2025-08-27T13:42:22Z

tests/unit/rag/test_rag_query.py

+        # Parse metadata from query result
+        def parse_metadata(s):
+            import ast, re
+            match = re.search(r"Metadata:\s*(\{.*\})", s)


why use this and the if statement below? you should just be able to do the eval and evaluate the keys, no?

Because the TextContentItem returned looks like this:
TextContentItem(type='text', text="Result 2\nContent: chunk from db2\nMetadata: {'chunk_id': 'chunk2', 'document_id': 'doc2', 'source': 'test_source2', 'vector_db_id': 'db2'}\n")
the Metadata block needs to be explicitly parsed. That’s why I added the regex match and guard.

That said, I might not be fully understanding your point, did you mean this?

Ah, i see the text contains the Metadata json as a string! Cool, this makes sense.

franciscojavierarceo

lgtm

franciscojavierarceo

please resolve pre-commit.

llama_stack/providers/inline/tool_runtime/rag/memory.py

are-ces · 2025-09-02T11:12:21Z

This PR has been closed because the branch got contaminated and/or overwritten.
I’ve created a new PR from a clean branch that contains only the relevant commits: here.
Please review the new PR instead.

are-ces requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist, mattf and slekkala1 as code owners August 26, 2025 11:27

are-ces changed the title ~~Add vector_db_id to chunk metadata~~ feat: Add vector_db_id to chunk metadata Aug 26, 2025

franciscojavierarceo reviewed Aug 26, 2025

View reviewed changes

franciscojavierarceo requested changes Aug 26, 2025

View reviewed changes

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 27, 2025

are-ces force-pushed the main branch from ea38847 to 4644c3a Compare August 27, 2025 09:05

franciscojavierarceo reviewed Aug 27, 2025

View reviewed changes

tests/unit/rag/test_rag_query.py Outdated Show resolved Hide resolved

are-ces force-pushed the main branch from f4e17f4 to f913467 Compare August 27, 2025 13:29

franciscojavierarceo reviewed Aug 27, 2025

View reviewed changes

franciscojavierarceo approved these changes Aug 28, 2025

View reviewed changes

are-ces force-pushed the main branch from 65ab8d3 to 09b40cb Compare August 28, 2025 14:30

franciscojavierarceo requested changes Aug 29, 2025

View reviewed changes

are-ces force-pushed the main branch from 09b40cb to b50cb25 Compare August 29, 2025 06:45

franciscojavierarceo reviewed Sep 2, 2025

View reviewed changes

llama_stack/providers/inline/tool_runtime/rag/memory.py Outdated Show resolved Hide resolved

are-ces closed this Sep 2, 2025

are-ces force-pushed the main branch from 030de4b to 5c873d5 Compare September 2, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add vector_db_id to chunk metadata #3255

feat: Add vector_db_id to chunk metadata #3255

Uh oh!

are-ces commented Aug 26, 2025

Uh oh!

meta-cla bot commented Aug 26, 2025

Uh oh!

franciscojavierarceo Aug 26, 2025

Uh oh!

are-ces Aug 27, 2025

Uh oh!

franciscojavierarceo left a comment

Uh oh!

Uh oh!

franciscojavierarceo Aug 27, 2025

Uh oh!

are-ces Aug 28, 2025

Uh oh!

franciscojavierarceo Aug 28, 2025

Uh oh!

franciscojavierarceo left a comment

Uh oh!

franciscojavierarceo left a comment

Uh oh!

Uh oh!

are-ces commented Sep 2, 2025

Uh oh!

Uh oh!

feat: Add vector_db_id to chunk metadata #3255

feat: Add vector_db_id to chunk metadata #3255

Uh oh!

Conversation

are-ces commented Aug 26, 2025

What does this PR do?

Test Plan

Uh oh!

meta-cla bot commented Aug 26, 2025

Action Required

Process

Uh oh!

franciscojavierarceo Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

are-ces Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

franciscojavierarceo Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

are-ces Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

are-ces commented Sep 2, 2025

Uh oh!

Uh oh!