-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Implement intelligent chunking for large repository wiki generation and add XML structure prompt #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nd add XML structure prompt
Summary of ChangesHello @adriandarian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's capability to generate wiki documentation for large repositories. It introduces an intelligent file chunking mechanism that allows the API to process vast codebases more efficiently by breaking them down into smaller, manageable parts. Concurrently, it integrates a new XML-formatted prompt to ensure structured and consistent wiki output. These changes collectively enable a more robust and scalable approach to repository analysis and wiki generation, particularly for extensive projects. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an intelligent chunking system for processing large repositories to generate wikis. It adds new logic for file collection, filtering, and chunking, and updates the /local_repo/structure
endpoint to leverage this system. A new XML-based prompt for wiki structure generation is also included, with corresponding updates to the WebSocket handler to process these requests. My review focuses on improving the robustness of the chunking logic, increasing efficiency by removing redundant operations, and enhancing code quality by addressing debug artifacts, local imports, and duplicated code. While the chunking infrastructure is well-started, the functions to process these chunks are currently placeholders and will need implementation.
- Refactor `collect_all_files` to return README content alongside file paths. - Introduce `handle_response_stream` to streamline response processing for different providers. - Update WebSocket handling to utilize the new response handling function, reducing code duplication. - Improve logging for better traceability during file collection and response streaming.
good job! |
Have you considered introducing AST chunk,like https://developers.llamaindex.ai/python/framework-api-reference/node_parsers/code/ |
- Added ASTChunker class for semantic chunking of code files. - Integrated AST chunking with existing adalflow pipeline via ASTTextSplitter. - Created configuration for AST chunking in embedder.ast.json. - Updated data pipeline to support AST chunking based on configuration. - Developed enable_ast.py script to toggle AST chunking on and off. - Enhanced logging for chunking statistics and errors. - Added support for various programming languages in AST chunking. - Updated docker-compose to allow enabling AST chunking during build.
… docker-compose for config mounting
Had not considered before but like it, so here is an updated with a docker-compose flag to toggle AST on/off |
This PR introduces two major improvements to the
deepwiki-open
project:Intelligent Chunking for Large Repository Wiki Generation
Add XML Structure Prompt
Details
Motivation
Impact
How to Test
Closes Issues:
Reviewer Notes:
Please pay particular attention to chunking edge cases and XML schema compliance.