Skip to content

Build a knowledge graph connecting variants, drugs, and clinical evidence to identify therapeutic opportunities (e.g. CIViC - Clinical Interpretations of Variants in Cancer)

License

Notifications You must be signed in to change notification settings

collaborativebioinformatics/GeNETwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GeNETwork Logo

How to Install

Clone the Repository

git clone https://github.com/collaborativebioinformatics/GeNETwork.git
cd GeNETwork

Get the Data

The knowledge graph is still in development, but the raw data is available on OSF: OSF Project Home

Requirements (Future Graph Development)

Currently, no installation is required beyond accessing the datasets. In the near future, we’ll include:

  • Graph database setup (e.g., Amazon Neptune, Neo4j, or RDF tools)
  • ETL scripts to ingest compatible data into the graph
  • Example queries (Gremlin / SPARQL / Cypher)

Background

Pediatric cancers have historically been underserved in therapeutic development and clinical trials. Unlike adult cancers, they were not previously required to be studied by pharmaceutical companies, leaving most pediatric therapeutic discovery to pediatric researchers themselves.

To address this gap, the FDA created the Pediatric Molecular Target List (PMTL), a list of molecular targets important for studying and developing drugs for pediatric cancers. The RACE for Children Act, USA further strengthened this framework: if a drug company develops a therapy against an actionable mutation or gene in adult cancer, and that gene also appears on the PMTL, the company must either justify excluding pediatric trials or move forward with testing in children.

Building on this foundation, our project focuses on pediatric cancers using harmonized datasets from the Molecular Targets Project (MTP) which integrates genomic data from Kids First, TARGET, and other pediatric cohorts. This work was spearheaded at the Children's Hospital of Philadelphia, which aligned RNA and DNA sequencing data processed by the Kids First Data Resource Center and harmonized through the OpenPedCan suite of tools (Github cited in Pubmed).

Together, these resources provide a robust and unified data foundation to explore pediatric-specific therapeutic opportunities that align with both clinical priorities (PMTL) and regulatory requirements (RACE Act).

Methods

Data Sources

Source Purpose
Molecular Targets Project Catalog of Molecular Targets
FDA PMTL Pediatric Cancer Target List
TCGA Adult Cancer Somatic Mutations
CIViC Clinical variant interpretations
OncoDB Drug-target interactions
MSigDB Pathway gene sets
HGNC Gene nomenclature standardization
StringDB Protein-protein interactions

Flowchart

FlowChart

Knowledge Graph Schema

KGSchema

Data Processing Pipeline

Data Processing

  • Downloaded datasets (TSV, JSON)
  • Standardized Schemas for each dataset from different sources using gene_name as primary key
  • Load to Open Science Framework (OSF)

Knowledge Graph Construction

  • Define Node types
  • Define relationships (edges)
  • Define properties
  • Load data on AWS Neptune

Analysis & Querying

  • Use Cypher
    • Cross-age repurposing via shared variants
    • Pathway-level repurposing
    • TCGA-specific insights
    • Drug-Repurposing Opportunities
    • Variants in Both Adult and Pediatric Cancers
    • Co-mutation Analysis
    • Therapeutic Gaps
    • Paths from Variant to Drug

Visualization/Outputs

  • Network/ Graph Visualizations
  • Downloadable Summary Tables

About

Build a knowledge graph connecting variants, drugs, and clinical evidence to identify therapeutic opportunities (e.g. CIViC - Clinical Interpretations of Variants in Cancer)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9