Skip to content

[Docs] Documentation representation for GearNet #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .codeboarding/Configuration_System.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
```mermaid

graph LR

Configuration_Files_Repository["Configuration Files Repository"]

Configuration_Loader_Parser["Configuration Loader/Parser"]

Pretraining_Script["Pretraining Script"]

Downstream_Task_Script["Downstream Task Script"]

Configuration_Files_Repository -- "Provides Configuration Data To" --> Configuration_Loader_Parser

Configuration_Loader_Parser -- "Provides Parsed Configuration To" --> Pretraining_Script

Configuration_Loader_Parser -- "Provides Parsed Configuration To" --> Downstream_Task_Script

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Details



The Configuration System is a cornerstone of this Deep Learning Research Framework, providing a robust and flexible mechanism for managing all experimental parameters. It adheres to the "Configuration-Driven Development" architectural pattern, ensuring reproducibility, ease of experimentation, and clear separation of concerns.



### Configuration Files Repository

This component serves as the central, organized storage for all experimental parameters, model architectures, hyperparameters, dataset paths, and training settings. It is meticulously structured into `pretrain` and `downstream` subdirectories, with further categorization by specific tasks (e.g., EC, Fold3D, GO-BP) and model types. This hierarchical organization is crucial for managing the complexity of deep learning experiments, promoting reusability of configurations, and facilitating reproducible research by providing a single source of truth for all experimental setups.





**Related Classes/Methods**:



- `config/downstream/EC/BERT.yaml` (1:1)

- `config/pretrain/mc_gearnet_edge.yaml` (1:1)





### Configuration Loader/Parser

This component is responsible for programmatically reading, parsing, and transforming the raw YAML data from the `Configuration Files Repository` into structured, accessible objects. It likely leverages libraries such as `pyyaml` for parsing the YAML syntax and `easydict` to provide convenient dictionary-like access to the loaded configurations. This transformation is critical as it converts static configuration files into dynamic objects that can be easily consumed and manipulated by other parts of the system, such as training and evaluation scripts.





**Related Classes/Methods**:



- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/util.py#L1-L1" target="_blank" rel="noopener noreferrer">`util.py` (1:1)</a>





### Pretraining Script

This script orchestrates the pretraining phase of protein representation models. It is a primary consumer of the configurations, relying on the `Configuration Loader/Parser` to load specific settings related to model architecture, dataset paths, training schedules, optimization parameters, and other pretraining-specific configurations. This component embodies the "Training Loop Pattern" and uses the loaded configurations to initialize and execute the pretraining process.





**Related Classes/Methods**:



- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/script/pretrain.py#L1-L1" target="_blank" rel="noopener noreferrer">`script/pretrain.py` (1:1)</a>





### Downstream Task Script

This script manages the execution and evaluation of models on various downstream biological tasks (e.g., Enzyme Commission (EC) number prediction, 3D fold prediction, Gene Ontology (GO) term prediction). Similar to the pretraining script, it heavily relies on the `Configuration Loader/Parser` to load task-specific settings, including model configurations, dataset paths for the specific task, evaluation metrics, and fine-tuning parameters. This component also follows the "Training Loop Pattern" but is tailored for task-specific fine-tuning and evaluation.





**Related Classes/Methods**:



- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/script/downstream.py#L1-L1" target="_blank" rel="noopener noreferrer">`script/downstream.py` (1:1)</a>









### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
159 changes: 159 additions & 0 deletions .codeboarding/Data_Pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
```mermaid

graph LR

Data_Pipeline["Data Pipeline"]

Model_Training_Evaluation["Model Training & Evaluation"]

Model_Architecture["Model Architecture"]

Configuration_Manager["Configuration Manager"]

Experiment_Tracking_Checkpointing["Experiment Tracking & Checkpointing"]

Data_Pipeline -- "provides data to" --> Model_Training_Evaluation

Configuration_Manager -- "receives configuration from" --> Data_Pipeline

Model_Training_Evaluation -- "consumes data from" --> Data_Pipeline

Model_Training_Evaluation -- "uses" --> Model_Architecture

Model_Training_Evaluation -- "receives parameters from" --> Configuration_Manager

Model_Training_Evaluation -- "outputs to" --> Experiment_Tracking_Checkpointing

Model_Architecture -- "is used by" --> Model_Training_Evaluation

Model_Architecture -- "expects input from" --> Data_Pipeline

Configuration_Manager -- "provides config to" --> Data_Pipeline

Configuration_Manager -- "provides config to" --> Model_Training_Evaluation

Configuration_Manager -- "provides config to" --> Model_Architecture

Experiment_Tracking_Checkpointing -- "receives metrics and checkpoints from" --> Model_Training_Evaluation

Experiment_Tracking_Checkpointing -- "provides checkpoints to" --> Model_Training_Evaluation

click Data_Pipeline href "https://github.com/DeepGraphLearning/GearNet/blob/main/.codeboarding//Data_Pipeline.md" "Details"

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Details



This project is a Deep Learning Research Framework/Library for Protein Representation Learning. The Data Pipeline component is central to its functionality, handling all aspects of data preparation for protein representation learning, from raw data acquisition to structured protein graphs and dataset splitting.



### Data Pipeline [[Expand]](./Data_Pipeline.md)

Manages the entire data lifecycle, including loading, preprocessing, featurization, and dataset splitting for protein data.





**Related Classes/Methods**:



- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/dataset.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.dataset` (1:1)</a>





### Model Training & Evaluation

Orchestrates the training loops, model optimization, validation, and evaluation of protein representation models.





**Related Classes/Methods**:



- `scripts.train` (1:1)

- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/model.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.model` (1:1)</a>





### Model Architecture

Defines the neural network architectures used for protein representation learning (e.g., graph neural networks).





**Related Classes/Methods**:



- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/model.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.model` (1:1)</a>

- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/layer.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.layer` (1:1)</a>





### Configuration Manager

Handles loading, parsing, and managing project configurations (e.g., model hyperparameters, dataset paths, training settings) from YAML files.





**Related Classes/Methods**:



- `utils.config` (1:1)





### Experiment Tracking & Checkpointing

Manages the logging of training metrics, saving model checkpoints, and potentially resuming training.





**Related Classes/Methods**:



- `utils.checkpoint` (1:1)

- `scripts.train` (1:1)









### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
87 changes: 87 additions & 0 deletions .codeboarding/Model_Core_GNNs_.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
```mermaid

graph LR

gearnet_layer_IEConvLayer["gearnet.layer.IEConvLayer"]

gearnet_layer_GeometricRelationalGraphConv["gearnet.layer.GeometricRelationalGraphConv"]

gearnet_model_GearNetIEConv["gearnet.model.GearNetIEConv"]

gearnet_model_FusionNetwork["gearnet.model.FusionNetwork"]

gearnet_model_GearNetIEConv -- "uses" --> gearnet_layer_IEConvLayer

gearnet_model_GearNetIEConv -- "uses" --> gearnet_layer_GeometricRelationalGraphConv

gearnet_model_FusionNetwork -- "composes" --> gearnet_model_GearNetIEConv

```



[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])



## Details



The `gearnet` subsystem is designed for molecular representation learning, primarily focusing on graph neural networks. It comprises distinct layers that perform specific graph operations and a model that orchestrates these layers to process molecular structures.



### gearnet.layer.IEConvLayer

This component represents an Interaction-Enhanced Convolutional Layer. It's a fundamental building block for processing graph-structured data, specifically designed to incorporate detailed interaction information between nodes (e.g., atoms in a molecule).





**Related Classes/Methods**: _None_



### gearnet.layer.GeometricRelationalGraphConv

This component implements a Geometric Relational Graph Convolutional layer. It's another core layer type within the GearNet architecture, focusing on incorporating both geometric and relational information during graph convolutions.





**Related Classes/Methods**: _None_



### gearnet.model.GearNetIEConv

This is the main GearNet model, specifically an Interaction-Enhanced Convolutional Graph Neural Network. It orchestrates multiple `GeometricRelationalGraphConv` layers and optionally `IEConvLayer` instances to build a deep graph neural network for molecular representation learning.





**Related Classes/Methods**: _None_



### gearnet.model.FusionNetwork

This component is a higher-level model designed to combine the outputs of two distinct models: a `sequence_model` and a `structure_model`. It's likely used for multi-modal learning, integrating information from different representations (e.g., sequence data and structural data of a molecule).





**Related Classes/Methods**: _None_







### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
Loading