Add script to generate content_catalog.json (similar to llms-full.txt) #1299
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📚 Context
Add script to generate
content_catalog.json
(similar tollms-full.txt
), which is used in the RAG pipeline for st-assistant.streamlit.appAs a cleanup, this PR also removes the old
apply_image_effects.py
and the accompanyingblurmask.png
.You might be asking why we need this, if I also have a PR for
llms-full.txt
over here. The answer isllms-full.txt
would probably work fine for st-assistant, but this format has one nice thing going for it: it allows us to easily include the URL of each section in every chunk of the section inside the RAG database. So I'm OK keeping them separate, at least for now.Your next question might be whether we can merge a lot of the logic in these scripts and yes we totally can! But we can do that as a next step.
💥 Impact
Size:
🌐 References
n/a
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.