|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +http-stream-xml is a Python library for parsing XML from HTTP responses on-the-fly by chunks, without needing to load the entire document. The main use case is working with NCBI (PubMed) Entrez API responses. |
| 8 | + |
| 9 | +## Core Architecture |
| 10 | + |
| 11 | +### Main Components |
| 12 | + |
| 13 | +- `src/http_stream_xml/xml_stream.py` - Core XML streaming parser using SAX |
| 14 | + - `XmlStreamExtractor` - Main class that feeds XML chunks to parser |
| 15 | + - `StreamHandler` - SAX content handler that collects specified tags |
| 16 | + - `ExtractionCompleted` - Exception raised when all required tags are found |
| 17 | + |
| 18 | +- `src/http_stream_xml/entrez.py` - NCBI Entrez API integration |
| 19 | + - `Genes` - Main class for fetching gene information from NCBI |
| 20 | + - `GeneFields` - Constants for gene field names in XML responses |
| 21 | + - Implements caching, retry logic, and partial download optimization |
| 22 | + |
| 23 | +- `src/http_stream_xml/socket_stream.py` - Socket-based streaming functionality |
| 24 | +- `src/http_stream_xml/examples/` - Usage examples for different scenarios |
| 25 | + |
| 26 | +### Key Design Patterns |
| 27 | + |
| 28 | +- Uses SAX parser for memory-efficient XML processing |
| 29 | +- Implements early termination when all required tags are found |
| 30 | +- Built-in retry mechanisms for unreliable network connections |
| 31 | +- Caching layer for API responses |
| 32 | +- Stream processing with configurable timeouts and byte limits |
| 33 | + |
| 34 | +## Development Commands |
| 35 | + |
| 36 | +### Testing |
| 37 | +```bash |
| 38 | +# Run fast tests (exclude slow tests) |
| 39 | +inv test |
| 40 | +# Or directly: ./scripts/test.sh -m 'not slow' |
| 41 | + |
| 42 | +# Run all tests including slow ones |
| 43 | +inv test-full |
| 44 | +# Or directly: ./scripts/test.sh |
| 45 | + |
| 46 | +# Run specific test pattern |
| 47 | +./scripts/test.sh -k "pattern_or_substring" |
| 48 | + |
| 49 | +# Test with coverage (built into test scripts) |
| 50 | +coverage run -m pytest |
| 51 | +coverage report --omit='tests/*' |
| 52 | +``` |
| 53 | + |
| 54 | +### Code Quality |
| 55 | +```bash |
| 56 | +# Run pre-commit checks (linting, formatting, type checking) |
| 57 | +inv pre |
| 58 | +# Or directly: pre-commit run --verbose --all-files |
| 59 | + |
| 60 | +# Individual quality checks are handled by pre-commit: |
| 61 | +# - ruff (linting and formatting) |
| 62 | +# - mypy (type checking) |
| 63 | +# - Various pre-commit hooks |
| 64 | +``` |
| 65 | + |
| 66 | +### Building and Dependencies |
| 67 | +```bash |
| 68 | +# Build package |
| 69 | +./scripts/build.sh |
| 70 | +# Or directly: python setup.py bdist_wheel |
| 71 | + |
| 72 | +# Compile requirements |
| 73 | +inv compile-requirements |
| 74 | +# Or directly: uv pip compile requirements.in --output-file=requirements.txt --upgrade |
| 75 | + |
| 76 | +# Install/upgrade dependencies |
| 77 | +inv reqs |
| 78 | +``` |
| 79 | + |
| 80 | +### Documentation |
| 81 | +```bash |
| 82 | +# Build documentation |
| 83 | +inv docs |
| 84 | +# Or directly: sphinx-build docs docs_build |
| 85 | + |
| 86 | +# Check documentation links |
| 87 | +inv docs-check |
| 88 | +``` |
| 89 | + |
| 90 | +## Project Configuration |
| 91 | + |
| 92 | +- **Python version**: Requires Python 3.11+ |
| 93 | +- **Dependencies**: Managed via requirements.in/requirements.txt with uv |
| 94 | +- **Linting**: ruff with strict configuration (line length 100, extensive rule set) |
| 95 | +- **Type checking**: mypy with strict settings |
| 96 | +- **Testing**: pytest with coverage reporting |
| 97 | +- **Build system**: Traditional setup.py (not pyproject.toml) |
| 98 | + |
| 99 | +## Key Files and Structure |
| 100 | + |
| 101 | +- `src/http_stream_xml/` - Main package source |
| 102 | +- `tests/` - Test suite with pytest |
| 103 | +- `scripts/` - Shell scripts for common development tasks |
| 104 | +- `tasks.py` - invoke task definitions for development commands |
| 105 | +- `requirements.in` / `requirements.dev.in` - Dependency specifications |
| 106 | +- `.pre-commit-config.yaml` - Code quality automation |
| 107 | + |
| 108 | +## Testing Strategy |
| 109 | + |
| 110 | +Tests are organized to support both fast and comprehensive testing: |
| 111 | +- Fast tests run by default (exclude tests marked as 'slow') |
| 112 | +- Full test suite includes integration tests with external APIs |
| 113 | +- Coverage reporting is integrated into test runs |
| 114 | +- Tests exclude the `tests/` directory from linting to allow more flexible test code |
0 commit comments