A LinkML schema for describing Reference Ingest Guides (RIGs) - structured documents that capture the scope, rationale, and modeling approach for ingesting content from external sources into Biolink Model-compliant data repositories.
This repository provides:
- LinkML Schema: Formal specification for Reference Ingest Guides in
src/resource_ingest_guide_schema/schema/
- Documentation Generator: Automated conversion of RIG YAML files to human-readable markdown
- Validation Tools: Schema validation for RIG files using LinkML
- Template System: Standardized templates and creation tools for new RIGs
- Example RIGs: Real-world examples from CTD, DISEASES, and Clinical Trials KP
RIGs are structured documents that describe:
- Source Information: Details about data sources (access, formats, licensing)
- Ingest Information: What content is included/excluded and filtering rationale
- Target Information: How data is modeled in the output knowledge graph
- Provenance Information: Contributors and related artifacts
RIGs help ensure reproducible, well-documented data ingestion processes for biomedical knowledge graphs.
https://biolink.github.io/resource-ingest-guide-schema
├── src/
│ ├── resource_ingest_guide_schema/
│ │ └── schema/ # LinkML schema definition
│ ├── docs/
│ │ ├── files/ # Static documentation files
│ │ ├── rigs/ # Example RIG YAML files
│ │ └── doc-templates/ # Jinja2 templates for docs
│ └── scripts/ # Python utilities for RIG processing
├── docs/ # Generated documentation
├── tests/ # Test suite
└── project/ # Generated LinkML artifacts
This project uses uv for dependency management. Install it with:
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or with pip
pip install uv
Note that the following commands assume you are in the project root directory, and the equivalent just
commands may be substituted for make
(namely just test
instead of make test
)
-
Install dependencies:
uv sync --extra dev
-
Run tests:
make test
-
Generate documentation:
make gendoc
-
Create a new RIG:
make new-rig INFORES=infores:example NAME="Example Data Source"
# Create a new RIG from the template
make new-rig INFORES=infores:mydatasource NAME="My Data Source RIG"
# This creates src/docs/rigs/mydatasource_rig.yaml
# Edit the file to fill in your specific information
# Validate all RIG files against the schema
make validate-rigs
# Validate a specific RIG
uv run linkml-validate --schema src/resource_ingest_guide_schema/schema/resource_ingest_guide_schema.yaml src/docs/rigs/my_rig.yaml
# Generate all documentation including RIG index and markdown versions
make gendoc
# Test documentation locally
make testdoc # Builds docs and starts local server
The LinkML schema is defined in src/resource_ingest_guide_schema/schema/resource_ingest_guide_schema.yaml
. After making changes:
# Regenerate Python datamodel and other artifacts
make gen-project
# Test the schema
make test-schema
# Lint the schema
make lint
Python utilities are in src/scripts/
:
create_rig.py
: Generate new RIG from templaterig_to_markdown.py
: Convert RIG YAML to markdowngenerate_rig_index.py
: Create RIG index table
To test script changes:
# Run scripts directly
uv run python src/scripts/create_rig.py --help
uv run python src/scripts/rig_to_markdown.py --input-dir src/docs/rigs --output-dir docs
Templates are in src/docs/doc-templates/
and static files in src/docs/files/
:
# Regenerate docs after template changes
make gendoc
# View changes locally
make serve # or make testdoc
Command | Description |
---|---|
make help |
Show all available commands |
make install |
Install dependencies with uv |
make test |
Run full test suite |
make test-schema |
Test schema generation |
make test-python |
Run Python tests |
make lint |
Lint the LinkML schema |
make gen-project |
Generate LinkML artifacts (Python, JSON Schema, etc.) |
make gendoc |
Generate documentation including RIG processing |
make serve |
Start local documentation server |
make testdoc |
Build docs and start server |
make new-rig |
Create new RIG (requires INFORES and NAME) |
make validate-rigs |
Validate all RIG files |
make clean |
Clean generated files |
make deploy |
Deploy documentation |
src/resource_ingest_guide_schema/schema/
: LinkML schema definitionsrc/docs/rigs/
: Example RIG YAML files (CTD, DISEASES, Clinical Trials KP)src/docs/files/
: Static documentation files copied to outputsrc/docs/doc-templates/
: Jinja2 templates for documentation generationsrc/scripts/
: Python utilities for RIG creation and processingdocs/
: Generated documentation output (do not edit directly)project/
: Generated LinkML artifacts (Python models, JSON Schema, etc.)
The make gen-project
command generates:
- Python datamodel:
src/resource_ingest_guide_schema/datamodel/
- JSON Schema:
project/jsonschema/
- OWL ontology:
project/owl/
- GraphQL schema:
project/graphql/
- SQL DDL:
project/sqlschema/
- And more: See
project/
directory
- Fork the repository
- Create a feature branch
- Make changes following the existing patterns
- Ensure tests pass:
make test
- Update documentation if needed:
make gendoc
- Submit a pull request
- Create YAML file in
src/docs/rigs/
- Follow the schema structure (see existing examples)
- Validate:
make validate-rigs
- Regenerate docs:
make gendoc
- The RIG will automatically appear in the documentation index
- Modify
src/resource_ingest_guide_schema/schema/resource_ingest_guide_schema.yaml
- Regenerate artifacts:
make gen-project
- Update any affected RIG files
- Test:
make test
- Update documentation as needed