A hybrid system combining Knowledge Graphs and Retrieval-Augmented Generation (RAG) for intelligent querying of the Indian Income Tax Act.
Traditional RAG systems struggle with legal documents because they miss the interconnected nature of legal provisions. This system solves that by:
- "What sections reference Section 80C?"
- "Show me all exemptions available for senior citizens"
- "What penalties apply if I violate Section 44AD?"
- "How does Section 10 relate to Section 80?"
- Sections that reference other sections
- Definitions used across multiple places
- Conditions and thresholds (income slabs, age limits)
- Exemptions with eligibility criteria
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Tax Act Text │───▶│ Parser Module │───▶│ Knowledge Graph│
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ │
┌──────────────────┐ │
│ Vector Store │ │
│ (RAG) │ │
└──────────────────┘ │
│ │
▼ ▼
┌──────────────────────────────────────┐
│ Hybrid Query System │
│ • KG Queries (relationships) │
│ • RAG Queries (content) │
│ • Hybrid Queries (both) │
└──────────────────────────────────────┘
- Python 3.8+
- Neo4j Database (local or cloud)
- OpenAI API Key (optional, for enhanced responses)
git clone <repository>
cd ita-kg
# Setup virtual environment and install dependencies
make setup
# Configure environment
cp .env.example .env
# Edit .env with your credentials
# Run with virtual environment
make run
# Or run demo
make demogit clone <repository>
cd ita-kg
# Configure environment
cp .env.example .env
# Edit .env with your credentials
# Start services (Neo4j + app)
make docker-up
# View logs
make docker-logsgit clone <repository>
cd ita-kg
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Setup Neo4j manually
docker run --name neo4j -p7474:7474 -p7687:7687 -d \
-e NEO4J_AUTH=neo4j/your_password \
neo4j:latest
python main.pymake help # Show all available commands
make setup # Create venv and install dependencies
make run # Run main.py using virtual environment
make demo # Run demo.py using virtual environment
make docker-up # Start Docker services
make docker-down # Stop Docker services
make clean # Remove venv and Docker volumes# What sections reference Section 80C?
response = query_system.query("What sections reference Section 80C?")
# Returns: Sections that reference Section 80C:
# • Section 80TTB: Deduction in respect of interest on deposits...
# Find related sections
response = query_system.query("What sections are related to Section 139?")# Eligibility questions
response = query_system.query("What exemptions are available for senior citizens?")
# Combines: KG to find exemption sections + RAG for senior citizen content
# Category queries
response = query_system.query("What deductions are available?")# Detailed explanations
response = query_system.query("Explain Section 44AD for presumptive taxation")
# Specific information
response = query_system.query("What is the penalty for not filing returns?")ita-kg/
├── tax_parser.py # Income Tax Act text parser
├── knowledge_graph.py # Neo4j Knowledge Graph builder
├── hybrid_query_system.py # Query routing and processing
├── main.py # Interactive system
├── demo.py # Capabilities demonstration
├── sample_income_tax_act.txt # Sample tax act data
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── README.md # This file
- Extracts sections, titles, and content
- Identifies cross-references between sections
- Classifies section types (exemption, deduction, penalty)
- Extracts key concepts and definitions
- Creates Neo4j nodes for sections
- Builds REFERENCES relationships
- Adds concept categorization
- Provides graph analytics
- Routes queries based on type:
- KG: Reference/relationship queries
- RAG: Content/explanation queries
- Hybrid: Complex eligibility queries
- Combines results for comprehensive answers
- Reference Tracking: "What references Section X?"
- Relationship Discovery: "What sections are related to X?"
- Category Queries: "Show all deduction sections"
- Eligibility Analysis: "What exemptions for senior citizens?"
- Content Explanation: "Explain presumptive taxation"
- Impact Analysis: "If Section X changes, what's affected?"
- Section count by type
- Cross-reference statistics
- Concept distribution
- Reference network analysis
The system comes with sample data covering key sections:
- Exemptions: Sections 10, 10A
- Deductions: Sections 80C, 80D, 80TTB
- Penalties: Sections 271F, 271B
- Procedures: Sections 139, 44AD
- Definitions: Section 2
Try these queries:
• "What sections reference Section 80C?"
• "What exemptions are available for senior citizens?"
• "Show me all penalty sections"
• "What is Section 44AD about?"
• "Which sections mention agricultural income?"
- Cross-Reference Navigation: Navigate the web of legal references
- Structured Categorization: Find all exemptions/deductions instantly
- Impact Analysis: See what's affected when sections change
- Context-Aware Responses: Combine structure with content
- Scalable: Add more legal documents to the same graph
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
OPENAI_API_KEY=your_api_key # Optional- Add sections to
sample_income_tax_act.txt - Follow the format:
Section X - Title - The parser will automatically extract references and relationships
- Run the system to rebuild the knowledge graph
# Check if Neo4j is running
curl http://localhost:7474
# Verify credentials in .env file
# Make sure bolt port (7687) is accessible# Reinstall dependencies
pip install -r requirements.txt
# Check Python version (3.8+ required)
python --version- Check Neo4j database has data:
MATCH (n) RETURN count(n) - Verify section format in source text
- Check logs for parsing errors
- Graph Build: ~1-2 seconds for sample data
- Query Response: ~100-500ms average
- Memory Usage: ~50MB for sample dataset
- Scalability: Tested up to 1000+ sections
- Add more legal documents (Companies Act, GST Act)
- Enhanced NLP for better reference extraction
- Web interface with graph visualization
- Multi-language support
- Advanced analytics and insights
MIT License - Feel free to use for educational and commercial purposes.