Skip to content

Commit 3801ebf

Browse files
committed
agents files
1 parent 189d7fe commit 3801ebf

File tree

2 files changed

+152
-0
lines changed

2 files changed

+152
-0
lines changed

AGENTS.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Repository Guidelines
2+
3+
These guidelines help contributors work consistently on http-stream-xml. They reflect the current layout, tooling, and CI in this repo.
4+
5+
## Project Structure & Module Organization
6+
- `src/http_stream_xml/`: Library code (`xml_stream.py`, `socket_stream.py`, `entrez.py`, `version.py`).
7+
- `tests/`: Pytest suite (`test_*.py`).
8+
- `docs/`: Sphinx documentation.
9+
- `scripts/`: Helper scripts (build, test, upload, etc.).
10+
- Root files: `tasks.py` (Invoke tasks), `setup.py`, `requirements*.txt`, `.pre-commit-config.yaml`.
11+
12+
## Build, Test, and Development Commands
13+
- `pipx install invoke` then `inv --list`: Discover tasks.
14+
- `inv test`: Run tests excluding `slow` marked tests.
15+
- `inv test_full`: Run full test suite.
16+
- `./scripts/test.sh -k substring`: Run tests via coverage with a filter.
17+
- `inv pre`: Run lint, format, and type checks (pre-commit).
18+
- `inv docs`: Build Sphinx docs to `docs_build`.
19+
- Packaging: `./scripts/build.sh` (wheel) and `./scripts/upload.sh` (twine upload).
20+
21+
## Coding Style & Naming Conventions
22+
- Python 3.11+; use type hints where practical (mypy runs in pre-commit).
23+
- Indentation: 4 spaces; keep lines ≤99–100 chars.
24+
- Lint/format: Ruff (including ruff-format) enforces style; run `inv pre` locally.
25+
- Naming: `snake_case` for modules/functions, `PascalCase` for classes, constants `UPPER_SNAKE_CASE`.
26+
27+
## Testing Guidelines
28+
- Framework: Pytest with coverage; tests live in `tests/` as `test_*.py`.
29+
- Markers: Use `@pytest.mark.slow` for long-running tests (excluded by `inv test`).
30+
- Run: `inv test` (fast), `inv test_full` (all). View coverage in CLI report from `scripts/test.sh`.
31+
32+
## Commit & Pull Request Guidelines
33+
- Commits: Short, imperative subject; explain “what/why”. Reference issues (e.g., `#6`) when applicable.
34+
- PRs: Include a clear description, linked issues, before/after notes for behavior changes, and test updates. Ensure CI is green and `inv pre` passes locally.
35+
36+
## Security & Release Notes
37+
- Do not commit secrets. Publishing uses local credentials (see `.pypirc`); verify with `./scripts/install_local.sh` before uploading.
38+
- Versioning is automated via scripts/tasks; coordinate version bumps with maintainers when preparing a release.

CLAUDE.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
http-stream-xml is a Python library for parsing XML from HTTP responses on-the-fly by chunks, without needing to load the entire document. The main use case is working with NCBI (PubMed) Entrez API responses.
8+
9+
## Core Architecture
10+
11+
### Main Components
12+
13+
- `src/http_stream_xml/xml_stream.py` - Core XML streaming parser using SAX
14+
- `XmlStreamExtractor` - Main class that feeds XML chunks to parser
15+
- `StreamHandler` - SAX content handler that collects specified tags
16+
- `ExtractionCompleted` - Exception raised when all required tags are found
17+
18+
- `src/http_stream_xml/entrez.py` - NCBI Entrez API integration
19+
- `Genes` - Main class for fetching gene information from NCBI
20+
- `GeneFields` - Constants for gene field names in XML responses
21+
- Implements caching, retry logic, and partial download optimization
22+
23+
- `src/http_stream_xml/socket_stream.py` - Socket-based streaming functionality
24+
- `src/http_stream_xml/examples/` - Usage examples for different scenarios
25+
26+
### Key Design Patterns
27+
28+
- Uses SAX parser for memory-efficient XML processing
29+
- Implements early termination when all required tags are found
30+
- Built-in retry mechanisms for unreliable network connections
31+
- Caching layer for API responses
32+
- Stream processing with configurable timeouts and byte limits
33+
34+
## Development Commands
35+
36+
### Testing
37+
```bash
38+
# Run fast tests (exclude slow tests)
39+
inv test
40+
# Or directly: ./scripts/test.sh -m 'not slow'
41+
42+
# Run all tests including slow ones
43+
inv test-full
44+
# Or directly: ./scripts/test.sh
45+
46+
# Run specific test pattern
47+
./scripts/test.sh -k "pattern_or_substring"
48+
49+
# Test with coverage (built into test scripts)
50+
coverage run -m pytest
51+
coverage report --omit='tests/*'
52+
```
53+
54+
### Code Quality
55+
```bash
56+
# Run pre-commit checks (linting, formatting, type checking)
57+
inv pre
58+
# Or directly: pre-commit run --verbose --all-files
59+
60+
# Individual quality checks are handled by pre-commit:
61+
# - ruff (linting and formatting)
62+
# - mypy (type checking)
63+
# - Various pre-commit hooks
64+
```
65+
66+
### Building and Dependencies
67+
```bash
68+
# Build package
69+
./scripts/build.sh
70+
# Or directly: python setup.py bdist_wheel
71+
72+
# Compile requirements
73+
inv compile-requirements
74+
# Or directly: uv pip compile requirements.in --output-file=requirements.txt --upgrade
75+
76+
# Install/upgrade dependencies
77+
inv reqs
78+
```
79+
80+
### Documentation
81+
```bash
82+
# Build documentation
83+
inv docs
84+
# Or directly: sphinx-build docs docs_build
85+
86+
# Check documentation links
87+
inv docs-check
88+
```
89+
90+
## Project Configuration
91+
92+
- **Python version**: Requires Python 3.11+
93+
- **Dependencies**: Managed via requirements.in/requirements.txt with uv
94+
- **Linting**: ruff with strict configuration (line length 100, extensive rule set)
95+
- **Type checking**: mypy with strict settings
96+
- **Testing**: pytest with coverage reporting
97+
- **Build system**: Traditional setup.py (not pyproject.toml)
98+
99+
## Key Files and Structure
100+
101+
- `src/http_stream_xml/` - Main package source
102+
- `tests/` - Test suite with pytest
103+
- `scripts/` - Shell scripts for common development tasks
104+
- `tasks.py` - invoke task definitions for development commands
105+
- `requirements.in` / `requirements.dev.in` - Dependency specifications
106+
- `.pre-commit-config.yaml` - Code quality automation
107+
108+
## Testing Strategy
109+
110+
Tests are organized to support both fast and comprehensive testing:
111+
- Fast tests run by default (exclude tests marked as 'slow')
112+
- Full test suite includes integration tests with external APIs
113+
- Coverage reporting is integrated into test runs
114+
- Tests exclude the `tests/` directory from linting to allow more flexible test code

0 commit comments

Comments
 (0)