Skip to content

Commit 745c527

Browse files
nerpaulaSimran-B
authored andcommitted
DOC-759 | Reworked GraphML documentation (#722)
* reworked graphml documentation * fix merge related errors * Apply suggestions from code review Co-authored-by: Simran <[email protected]> * review * minor changes * review --------- Co-authored-by: Simran <[email protected]>
1 parent 2c6bd92 commit 745c527

File tree

10 files changed

+399
-548
lines changed

10 files changed

+399
-548
lines changed

site/content/3.13/aql/functions/vector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ To use vector search, you need to have vector embeddings stored in documents
1212
and the attribute that stores them needs to be indexed by a
1313
[vector index](../../index-and-search/indexing/working-with-indexes/vector-indexes.md).
1414

15-
You can calculate vector embeddings using [ArangoDB's GraphML](../../data-science/arangographml/_index.md)
15+
You can calculate vector embeddings using [ArangoDB's GraphML](../../data-science/graphml/_index.md)
1616
capabilities (available in ArangoGraph) or using external tools.
1717

1818
{{< warning >}}

site/content/3.13/data-science/_index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Data Science
3-
menuTitle: Data Science
2+
title: Data Science and GenAI
3+
menuTitle: Data Science & GenAI
44
weight: 115
55
description: >-
66
ArangoDB lets you apply analytics and machine learning to graph data at scale
@@ -69,7 +69,7 @@ GraphML can answer questions like:
6969
![Graph ML](../../images/graph-ml.png)
7070

7171
For ArangoDB's enterprise-ready, graph-powered machine learning offering,
72-
see [ArangoGraphML](arangographml/_index.md).
72+
see [ArangoGraphML](graphml/_index.md).
7373

7474
## Use Cases
7575

site/content/3.13/data-science/arangographml/deploy.md

Lines changed: 0 additions & 76 deletions
This file was deleted.

site/content/3.13/data-science/arangographml/ui.md

Lines changed: 0 additions & 264 deletions
This file was deleted.

site/content/3.13/data-science/arangographml/_index.md renamed to site/content/3.13/data-science/graphml/_index.md

Lines changed: 43 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,23 @@
11
---
2-
title: ArangoGraphML
3-
menuTitle: ArangoGraphML
2+
title: ArangoDB GraphML
3+
menuTitle: GraphML
44
weight: 125
55
description: >-
6-
Enterprise-ready, graph-powered machine learning as a cloud service or self-managed
6+
Boost your machine learning models with graph data using ArangoDB's advanced GraphML capabilities
77
aliases:
8-
- graphml
8+
- arangographml
99
---
1010
Traditional Machine Learning (ML) overlooks the connections and relationships
1111
between data points, which is where graph machine learning excels. However,
1212
accessibility to GraphML has been limited to sizable enterprises equipped with
13-
specialized teams of data scientists. ArangoGraphML simplifies the utilization of GraphML,
13+
specialized teams of data scientists. ArangoDB simplifies the utilization of Graph Machine Learning,
1414
enabling a broader range of personas to extract profound insights from their data.
1515

16+
With ArangoDB, you can solve high-computational graph problems using Graph Machine
17+
Learning. Apply it on a selected graph to predict connections, get better product
18+
recommendations, classify nodes, and perform node embeddings. You can configure and run
19+
the whole machine learning flow entirely through the web interface or programmatically.
20+
1621
## How GraphML works
1722

1823
Graph machine learning leverages the inherent structure of graph data, where
@@ -21,35 +26,48 @@ traditional ML, which primarily operates on tabular data, GraphML applies
2126
specialized algorithms like Graph Neural Networks (GNNs), node embeddings, and
2227
link prediction to uncover complex patterns and insights.
2328

29+
The underlying framework for ArangoDB's GraphML is **[GraphSAGE](https://snap.stanford.edu/graphsage/)**.
30+
GraphSAGE (Graph Sample and AggreGatE) is a powerful Graph Neural Network (GNN)
31+
**framework** designed for inductive representation learning on large graphs.
32+
It is used to generate low-dimensional vector representations for nodes and is
33+
especially useful for graphs that have rich node attribute information.
34+
The overall process involves the following steps:
35+
2436
1. **Graph Construction**:
25-
Raw data is transformed into a graph structure, defining nodes and edges based
37+
- Raw data is transformed into a graph structure, defining nodes and edges based
2638
on real-world relationships.
27-
2. **Featurization**:
28-
Nodes and edges are enriched with features that help in training predictive models.
29-
3. **Model Training**:
30-
Machine learning techniques are applied on GNNs to identify patterns and make predictions.
39+
2. **Featurization**: Your raw graph data is transformed into numerical representations that the model can understand.
40+
- The system iterates over your selected vertices and converts their attributes: booleans become `0` or `1`, numbers are normalized, and text attributes are converted into numerical vectors using sentence transformers.
41+
- All of these numerical features are then combined (concatenated).
42+
- Finally, **Incremental PCA** (Incremental Principal Component Analysis a dimensionality reduction technique) is used to reduce the size of the combined features, which helps remove noise and keep only the most important information.
43+
3. **Training**: The model learns from the graph's structure by sampling and aggregating information from each node's local neighborhood.
44+
- For each node, GraphSAGE looks at connections up to **2 hops away**.
45+
- Specifically, it uniformly samples up to **25 direct neighbors** (depth 1) and for each of those, it samples up to **10 of their neighbors** (depth 2).
46+
- By aggregating feature information from this sampled neighborhood, the model creates a rich "embedding" for each node that captures both its own features and its role in the graph.
3147
4. **Inference & Insights**:
32-
The trained model is used to classify nodes, detect anomalies, recommend items,
48+
- The trained model is used to classify nodes, detect anomalies, recommend items,
3349
or predict future connections.
3450

35-
ArangoGraphML streamlines these steps, providing an intuitive and scalable
51+
ArangoDB streamlines these steps, providing an intuitive and scalable
3652
framework to integrate GraphML into various applications, from fraud detection
3753
to recommendation systems.
3854

3955
![GraphML Embeddings](../../../images/GraphML-Embeddings.webp)
4056

4157
![GraphML Workflow](../../../images/GraphML-How-it-works.webp)
4258

43-
It is no longer necessary to understand the complexities involved with graph
44-
machine learning, thanks to the accessibility of the ArangoML package.
45-
Solutions with ArangoGraphML only require input from a user about
46-
their data, and the ArangoGraphML managed service handles the rest.
59+
You no longer need to understand the complexities of graph machine learning to
60+
benefit from it. Solutions with ArangoDB's GraphML only require input from a user about
61+
their data, and the GraphML managed service handles the rest.
4762

4863
The platform comes preloaded with all the tools needed to prepare your graph
4964
for machine learning, high-accuracy training, and persisting predictions back
5065
to the database for application use.
5166

52-
## Supported Tasks
67+
## What you can do with GraphML
68+
69+
GraphML directly supports two primary machine learning tasks:
70+
**Node Classification** and **Node Embeddings**.
5371

5472
### Node Classification
5573

@@ -58,7 +76,7 @@ predict the label of a node based on both its own features and its relationships
5876
within the graph. It requires a set of labeled nodes to train a model, which then
5977
classifies unlabeled nodes based on learned patterns.
6078

61-
**How it works in ArangoGraphML**
79+
**How it works in ArangoDB**
6280

6381
- A portion of the nodes in a graph is labeled for training.
6482
- The model learns patterns from both **node features** and
@@ -97,7 +115,7 @@ into numerical vector representations, preserving their **structural relationshi
97115
within the graph. Unlike simple feature aggregation, node embeddings
98116
**capture the influence of neighboring nodes and graph topology**, making
99117
them powerful for downstream tasks like clustering, anomaly detection,
100-
and link prediction. These combinations can provide valuable insights.
118+
and link prediction. This combination provides valuable insights.
101119
Consider using [ArangoDB's Vector Search](https://arangodb.com/2024/11/vector-search-in-arangodb-practical-insights-and-hands-on-examples/)
102120
capabilities to find similar nodes based on their embeddings.
103121

@@ -116,7 +134,7 @@ Essentially, they aggregate both the node's attributes and the connectivity patt
116134
within the graph. This fusion helps capture not only the individual properties of
117135
a node but also its position and role within the network.
118136

119-
**How it works in ArangoGraphML**
137+
**How it works in ArangoDB**
120138

121139
- The model learns an embedding (a vector representation) for each node based on its
122140
**position within the graph and its connections**.
@@ -161,21 +179,21 @@ a node but also its position and role within the network.
161179
| **Key Advantage** | Learns labels based on node connections and attributes | Learns structural patterns and node relationships |
162180
| **Use Cases** | Fraud detection, customer segmentation, disease classification | Recommendations, anomaly detection, link prediction |
163181

164-
ArangoGraphML provides the infrastructure to efficiently train and apply these
182+
GraphML provides the infrastructure to efficiently train and apply these
165183
models, helping users extract meaningful insights from complex graph data.
166184

167185
## Metrics and Compliance
168186

169-
ArangoGraphML supports tracking your ML pipeline by storing all relevant metadata
187+
GraphML supports tracking your ML pipeline by storing all relevant metadata
170188
and metrics in a Graph called ArangoPipe. This is only available to you and is never
171189
viewable by ArangoDB. This metadata graph links all experiments
172190
to the source data, feature generation activities, training runs, and prediction
173191
jobs, allowing you to track the entire ML pipeline without having to leave ArangoDB.
174192

175-
### Security
193+
## Security
176194

177-
Each deployment that uses ArangoGraphML has an `arangopipe` database created,
195+
Each deployment that uses GraphML has an `arangopipe` database created,
178196
which houses all ML Metadata information. Since this data lives within the deployment,
179197
it benefits from the ArangoGraph security features and SOC 2 compliance.
180-
All ArangoGraphML services live alongside the ArangoGraph deployment and are only
198+
All GraphML services live alongside the ArangoGraph deployment and are only
181199
accessible within that organization.

0 commit comments

Comments
 (0)