You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site/content/3.13/data-science/graphml/_index.md
+43-25Lines changed: 43 additions & 25 deletions
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,23 @@
1
1
---
2
-
title: ArangoGraphML
3
-
menuTitle: ArangoGraphML
2
+
title: ArangoDB GraphML
3
+
menuTitle: GraphML
4
4
weight: 125
5
5
description: >-
6
-
Enterprise-ready, graph-powered machine learning as a cloud service or self-managed
6
+
Boost your machine learning models with graph data using ArangoDB's advanced GraphML capabilities
7
7
aliases:
8
-
- graphml
8
+
- arangographml
9
9
---
10
10
Traditional Machine Learning (ML) overlooks the connections and relationships
11
11
between data points, which is where graph machine learning excels. However,
12
12
accessibility to GraphML has been limited to sizable enterprises equipped with
13
-
specialized teams of data scientists. ArangoGraphML simplifies the utilization of GraphML,
13
+
specialized teams of data scientists. ArangoDB simplifies the utilization of Graph Machine Learning,
14
14
enabling a broader range of personas to extract profound insights from their data.
15
15
16
+
With ArangoDB, you can solve high-computational graph problems using Graph Machine
17
+
Learning. Apply it on a selected graph to predict connections, get better product
18
+
recommendations, classify nodes, and perform node embeddings. You can configure and run
19
+
the whole machine learning flow entirely through the web interface or programmatically.
20
+
16
21
## How GraphML works
17
22
18
23
Graph machine learning leverages the inherent structure of graph data, where
@@ -21,35 +26,48 @@ traditional ML, which primarily operates on tabular data, GraphML applies
21
26
specialized algorithms like Graph Neural Networks (GNNs), node embeddings, and
22
27
link prediction to uncover complex patterns and insights.
23
28
29
+
The underlying framework for ArangoDB's GraphML is **[GraphSAGE](https://snap.stanford.edu/graphsage/)**.
30
+
GraphSAGE (Graph Sample and AggreGatE) is a powerful Graph Neural Network (GNN)
31
+
**framework** designed for inductive representation learning on large graphs.
32
+
It is used to generate low-dimensional vector representations for nodes and is
33
+
especially useful for graphs that have rich node attribute information.
34
+
The overall process involves the following steps:
35
+
24
36
1.**Graph Construction**:
25
-
Raw data is transformed into a graph structure, defining nodes and edges based
37
+
-Raw data is transformed into a graph structure, defining nodes and edges based
26
38
on real-world relationships.
27
-
2.**Featurization**:
28
-
Nodes and edges are enriched with features that help in training predictive models.
29
-
3.**Model Training**:
30
-
Machine learning techniques are applied on GNNs to identify patterns and make predictions.
39
+
2.**Featurization**: Your raw graph data is transformed into numerical representations that the model can understand.
40
+
- The system iterates over your selected vertices and converts their attributes: booleans become `0` or `1`, numbers are normalized, and text attributes are converted into numerical vectors using sentence transformers.
41
+
- All of these numerical features are then combined (concatenated).
42
+
- Finally, **Incremental PCA** (Incremental Principal Component Analysis a dimensionality reduction technique) is used to reduce the size of the combined features, which helps remove noise and keep only the most important information.
43
+
3.**Training**: The model learns from the graph's structure by sampling and aggregating information from each node's local neighborhood.
44
+
- For each node, GraphSAGE looks at connections up to **2 hops away**.
45
+
- Specifically, it uniformly samples up to **25 direct neighbors** (depth 1) and for each of those, it samples up to **10 of their neighbors** (depth 2).
46
+
- By aggregating feature information from this sampled neighborhood, the model creates a rich "embedding" for each node that captures both its own features and its role in the graph.
31
47
4.**Inference & Insights**:
32
-
The trained model is used to classify nodes, detect anomalies, recommend items,
48
+
-The trained model is used to classify nodes, detect anomalies, recommend items,
33
49
or predict future connections.
34
50
35
-
ArangoGraphML streamlines these steps, providing an intuitive and scalable
51
+
ArangoDB streamlines these steps, providing an intuitive and scalable
36
52
framework to integrate GraphML into various applications, from fraud detection
0 commit comments