DeepGraphLearning · ivanmilevtues · Jul 4, 2025
diff --git a/.codeboarding/Configuration_System.md b/.codeboarding/Configuration_System.md
@@ -0,0 +1,113 @@
+```mermaid
+
+graph LR
+
+    Configuration_Files_Repository["Configuration Files Repository"]
+
+    Configuration_Loader_Parser["Configuration Loader/Parser"]
+
+    Pretraining_Script["Pretraining Script"]
+
+    Downstream_Task_Script["Downstream Task Script"]
+
+    Configuration_Files_Repository -- "Provides Configuration Data To" --> Configuration_Loader_Parser
+
+    Configuration_Loader_Parser -- "Provides Parsed Configuration To" --> Pretraining_Script
+
+    Configuration_Loader_Parser -- "Provides Parsed Configuration To" --> Downstream_Task_Script
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Details
+
+
+
+The Configuration System is a cornerstone of this Deep Learning Research Framework, providing a robust and flexible mechanism for managing all experimental parameters. It adheres to the "Configuration-Driven Development" architectural pattern, ensuring reproducibility, ease of experimentation, and clear separation of concerns.
+
+
+
+### Configuration Files Repository
+
+This component serves as the central, organized storage for all experimental parameters, model architectures, hyperparameters, dataset paths, and training settings. It is meticulously structured into `pretrain` and `downstream` subdirectories, with further categorization by specific tasks (e.g., EC, Fold3D, GO-BP) and model types. This hierarchical organization is crucial for managing the complexity of deep learning experiments, promoting reusability of configurations, and facilitating reproducible research by providing a single source of truth for all experimental setups.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `config/downstream/EC/BERT.yaml` (1:1)
+
+- `config/pretrain/mc_gearnet_edge.yaml` (1:1)
+
+
+
+
+
+### Configuration Loader/Parser
+
+This component is responsible for programmatically reading, parsing, and transforming the raw YAML data from the `Configuration Files Repository` into structured, accessible objects. It likely leverages libraries such as `pyyaml` for parsing the YAML syntax and `easydict` to provide convenient dictionary-like access to the loaded configurations. This transformation is critical as it converts static configuration files into dynamic objects that can be easily consumed and manipulated by other parts of the system, such as training and evaluation scripts.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/util.py#L1-L1" target="_blank" rel="noopener noreferrer">`util.py` (1:1)</a>
+
+
+
+
+
+### Pretraining Script
+
+This script orchestrates the pretraining phase of protein representation models. It is a primary consumer of the configurations, relying on the `Configuration Loader/Parser` to load specific settings related to model architecture, dataset paths, training schedules, optimization parameters, and other pretraining-specific configurations. This component embodies the "Training Loop Pattern" and uses the loaded configurations to initialize and execute the pretraining process.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/script/pretrain.py#L1-L1" target="_blank" rel="noopener noreferrer">`script/pretrain.py` (1:1)</a>
+
+
+
+
+
+### Downstream Task Script
+
+This script manages the execution and evaluation of models on various downstream biological tasks (e.g., Enzyme Commission (EC) number prediction, 3D fold prediction, Gene Ontology (GO) term prediction). Similar to the pretraining script, it heavily relies on the `Configuration Loader/Parser` to load task-specific settings, including model configurations, dataset paths for the specific task, evaluation metrics, and fine-tuning parameters. This component also follows the "Training Loop Pattern" but is tailored for task-specific fine-tuning and evaluation.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/script/downstream.py#L1-L1" target="_blank" rel="noopener noreferrer">`script/downstream.py` (1:1)</a>
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
diff --git a/.codeboarding/Data_Pipeline.md b/.codeboarding/Data_Pipeline.md
@@ -0,0 +1,159 @@
+```mermaid
+
+graph LR
+
+    Data_Pipeline["Data Pipeline"]
+
+    Model_Training_Evaluation["Model Training & Evaluation"]
+
+    Model_Architecture["Model Architecture"]
+
+    Configuration_Manager["Configuration Manager"]
+
+    Experiment_Tracking_Checkpointing["Experiment Tracking & Checkpointing"]
+
+    Data_Pipeline -- "provides data to" --> Model_Training_Evaluation
+
+    Configuration_Manager -- "receives configuration from" --> Data_Pipeline
+
+    Model_Training_Evaluation -- "consumes data from" --> Data_Pipeline
+
+    Model_Training_Evaluation -- "uses" --> Model_Architecture
+
+    Model_Training_Evaluation -- "receives parameters from" --> Configuration_Manager
+
+    Model_Training_Evaluation -- "outputs to" --> Experiment_Tracking_Checkpointing
+
+    Model_Architecture -- "is used by" --> Model_Training_Evaluation
+
+    Model_Architecture -- "expects input from" --> Data_Pipeline
+
+    Configuration_Manager -- "provides config to" --> Data_Pipeline
+
+    Configuration_Manager -- "provides config to" --> Model_Training_Evaluation
+
+    Configuration_Manager -- "provides config to" --> Model_Architecture
+
+    Experiment_Tracking_Checkpointing -- "receives metrics and checkpoints from" --> Model_Training_Evaluation
+
+    Experiment_Tracking_Checkpointing -- "provides checkpoints to" --> Model_Training_Evaluation
+
+    click Data_Pipeline href "https://github.com/DeepGraphLearning/GearNet/blob/main/.codeboarding//Data_Pipeline.md" "Details"
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Details
+
+
+
+This project is a Deep Learning Research Framework/Library for Protein Representation Learning. The Data Pipeline component is central to its functionality, handling all aspects of data preparation for protein representation learning, from raw data acquisition to structured protein graphs and dataset splitting.
+
+
+
+### Data Pipeline [[Expand]](./Data_Pipeline.md)
+
+Manages the entire data lifecycle, including loading, preprocessing, featurization, and dataset splitting for protein data.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/dataset.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.dataset` (1:1)</a>
+
+
+
+
+
+### Model Training & Evaluation
+
+Orchestrates the training loops, model optimization, validation, and evaluation of protein representation models.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `scripts.train` (1:1)
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/model.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.model` (1:1)</a>
+
+
+
+
+
+### Model Architecture
+
+Defines the neural network architectures used for protein representation learning (e.g., graph neural networks).
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/model.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.model` (1:1)</a>
+
+- <a href="https://github.com/DeepGraphLearning/GearNet/blob/main/gearnet/layer.py#L1-L1" target="_blank" rel="noopener noreferrer">`gearnet.layer` (1:1)</a>
+
+
+
+
+
+### Configuration Manager
+
+Handles loading, parsing, and managing project configurations (e.g., model hyperparameters, dataset paths, training settings) from YAML files.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `utils.config` (1:1)
+
+
+
+
+
+### Experiment Tracking & Checkpointing
+
+Manages the logging of training metrics, saving model checkpoints, and potentially resuming training.
+
+
+
+
+
+**Related Classes/Methods**:
+
+
+
+- `utils.checkpoint` (1:1)
+
+- `scripts.train` (1:1)
+
+
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)
diff --git a/.codeboarding/Model_Core_GNNs_.md b/.codeboarding/Model_Core_GNNs_.md
@@ -0,0 +1,87 @@
+```mermaid
+
+graph LR
+
+    gearnet_layer_IEConvLayer["gearnet.layer.IEConvLayer"]
+
+    gearnet_layer_GeometricRelationalGraphConv["gearnet.layer.GeometricRelationalGraphConv"]
+
+    gearnet_model_GearNetIEConv["gearnet.model.GearNetIEConv"]
+
+    gearnet_model_FusionNetwork["gearnet.model.FusionNetwork"]
+
+    gearnet_model_GearNetIEConv -- "uses" --> gearnet_layer_IEConvLayer
+
+    gearnet_model_GearNetIEConv -- "uses" --> gearnet_layer_GeometricRelationalGraphConv
+
+    gearnet_model_FusionNetwork -- "composes" --> gearnet_model_GearNetIEConv
+
+```
+
+
+
+[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%[email protected]?style=flat-square)](mailto:[email protected])
+
+
+
+## Details
+
+
+
+The `gearnet` subsystem is designed for molecular representation learning, primarily focusing on graph neural networks. It comprises distinct layers that perform specific graph operations and a model that orchestrates these layers to process molecular structures.
+
+
+
+### gearnet.layer.IEConvLayer
+
+This component represents an Interaction-Enhanced Convolutional Layer. It's a fundamental building block for processing graph-structured data, specifically designed to incorporate detailed interaction information between nodes (e.g., atoms in a molecule).
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### gearnet.layer.GeometricRelationalGraphConv
+
+This component implements a Geometric Relational Graph Convolutional layer. It's another core layer type within the GearNet architecture, focusing on incorporating both geometric and relational information during graph convolutions.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### gearnet.model.GearNetIEConv
+
+This is the main GearNet model, specifically an Interaction-Enhanced Convolutional Graph Neural Network. It orchestrates multiple `GeometricRelationalGraphConv` layers and optionally `IEConvLayer` instances to build a deep graph neural network for molecular representation learning.
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+### gearnet.model.FusionNetwork
+
+This component is a higher-level model designed to combine the outputs of two distinct models: a `sequence_model` and a `structure_model`. It's likely used for multi-modal learning, integrating information from different representations (e.g., sequence data and structural data of a molecule).
+
+
+
+
+
+**Related Classes/Methods**: _None_
+
+
+
+
+
+
+
+### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq)