Skip to content

✨ feat(additional-formats): use docling-project/docling to convert pdf and csv to md for indexing #13

@KemingHe

Description

@KemingHe

🎯 Problem

Current indexing only source from .md and .rst files for simplicity, discarding valuable information in .pdf and .csv and other file types.

πŸ’‘ Proposed Solution

For unified internal interface, convert all to .md using docling-project/docling while retaining metadata before indexing. This allows future-proof file type support, even for code files and others.

πŸ€” Alternatives Considered

Considered low level approaches such as using pdfplumber, determined too complex given existing integrated solutions such as docling.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions