Unity-Technologies · maryamziaa · Jul 22, 2025 · Jul 22, 2025 · Jul 22, 2025 · Jul 22, 2025
diff --git a/.gitignore b/.gitignore
@@ -50,8 +50,8 @@
 # Plugins
 /com.unity.ml-agents/VideoRecorder*
 
-# Generated doc folders
-/docs/html
+# MkDocs build output
+/site/
 
 # Mac hidden files
 *.DS_Store

diff --git a/.yamato/com.unity.ml-agents-test.yml b/.yamato/com.unity.ml-agents-test.yml
@@ -8,7 +8,7 @@ test_editors:
       enableNoDefaultPackages: !!bool true
 
 trunk_editor:
-    - version: trunk
+    - version: 6000.3.0a3
       testProject: DevProject
 
 test_platforms:

diff --git a/com.unity.ml-agents/CHANGELOG.md b/com.unity.ml-agents/CHANGELOG.md
@@ -11,11 +11,12 @@ and this project adheres to
 #### com.unity.ml-agents (C#)
 - Upgraded to Inference Engine 2.2.1 (#6212)
 - The minimum supported Unity version was updated to 6000.0. (#6207)
-- Merge the extension package com.unity.ml-agents.extensions to the main package com.unity.ml-agents. (#6227)
+- Merged the extension package com.unity.ml-agents.extensions to the main package com.unity.ml-agents. (#6227)
 
 ### Minor Changes
 #### com.unity.ml-agents (C#)
-- Remove broken sample from the package (#6230)
+- Removed broken sample from the package (#6230)
+- Moved to Unity Package documentation as the primary developer documentation. (#6232)
 
 #### ml-agents / ml-agents-envs
 - Bumped grpcio version to >=1.11.0,<=1.53.2 (#6208)

diff --git a/com.unity.ml-agents/Documentation~/Advanced-Features.md b/com.unity.ml-agents/Documentation~/Advanced-Features.md
@@ -0,0 +1,16 @@
+# Advanced Features
+
+The ML-Agents Toolkit provides several advanced features that extend the core functionality and enable sophisticated use cases.
+
+
+| **Feature**                                                 | **Description**                                                              |
+|-------------------------------------------------------------|------------------------------------------------------------------------------|
+| [Custom Side Channels](Custom-SideChannels.md)              | Create custom communication channels between Unity and Python.               |
+| [Custom Grid Sensors](Custom-GridSensors.md)                | Build specialized grid-based sensors for spatial data.                       |
+| [Input System Integration](InputSystem-Integration.md)      | Integrate ML-Agents with Unity's Input System.                               |
+| [Inference Engine](Inference-Engine.md)                     | Deploy trained models for real-time inference.                               |
+| [Hugging Face Integration](Hugging-Face-Integration.md)     | Connect with Hugging Face models and ecosystem.                              |
+| [Game Integrations](Integrations.md)                        | Integrate ML-Agents with specific game genres and mechanics (e.g., Match-3). |
+| [Match-3 Integration](Integrations-Match3.md)               | Abstraction and tools for Match-3 board games (board, sensors, actuators).   |
+| [ML-Agents Package Settings](Package-Settings.md)           | Configure advanced package settings and preferences.                         |
+| [Unity Environment Registry](Unity-Environment-Registry.md) | Manage and register Unity environments programmatically.                     |
diff --git a/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md b/com.unity.ml-agents/Documentation~/Background-Machine-Learning.md
@@ -0,0 +1,51 @@
+# Background: Machine Learning
+
+Given that a number of users of the ML-Agents Toolkit might not have a formal machine learning background, this page provides an overview to facilitate the understanding of the ML-Agents Toolkit. However, we will not attempt to provide a thorough treatment of machine learning as there are fantastic resources online.
+
+Machine learning, a branch of artificial intelligence, focuses on learning patterns from data. The three main classes of machine learning algorithms include: unsupervised learning, supervised learning and reinforcement learning. Each class of algorithm learns from a different type of data. The following paragraphs provide an overview for each of these classes of machine learning, as well as introductory examples.
+
+## Unsupervised Learning
+
+The goal of [unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning) is to group or cluster similar items in a data set. For example, consider the players of a game. We may want to group the players depending on how engaged they are with the game. This would enable us to target different groups (e.g. for highly-engaged players we might invite them to be beta testers for new features, while for unengaged players we might email them helpful tutorials). Say that we wish to split our players into two groups. We would first define basic attributes of the players, such as the number of hours played, total money spent on in-app purchases and number of levels completed. We can then feed this data set (three attributes for every player) to an unsupervised learning algorithm where we specify the number of groups to be two. The algorithm would then split the data set of players into two groups where the players within each group would be similar to each other. Given the attributes we used to describe each player, in this case, the output would be a split of all the players into two groups, where one group would semantically represent the engaged players and the second group would semantically represent the unengaged players.
+
+With unsupervised learning, we did not provide specific examples of which players are considered engaged and which are considered unengaged. We just defined the appropriate attributes and relied on the algorithm to uncover the two groups on its own. This type of data set is typically called an unlabeled data set as it is lacking these direct labels. Consequently, unsupervised learning can be helpful in situations where these labels can be expensive or hard to produce. In the next paragraph, we overview supervised learning algorithms which accept input labels in addition to attributes.
+
+## Supervised Learning
+
+In [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning), we do not want to just group similar items but directly learn a mapping from each item to the group (or class) that it belongs to. Returning to our earlier example of clustering players, let's say we now wish to predict which of our players are about to churn (that is stop playing the game for the next 30 days). We can look into our historical records and create a data set that contains attributes of our players in addition to a label indicating whether they have churned or not. Note that the player attributes we use for this churn prediction task may be different from the ones we used for our earlier clustering task. We can then feed this data set (attributes **and** label for each player) into a supervised learning algorithm which would learn a mapping from the player attributes to a label indicating whether that player will churn or not. The intuition is that the supervised learning algorithm will learn which values of these attributes typically correspond to players who have churned and not churned (for example, it may learn that players who spend very little and play for very short periods will most likely churn). Now given this learned model, we can provide it the attributes of a new player (one that recently started playing the game) and it would output a _predicted_ label for that player. This prediction is the algorithms expectation of whether the player will churn or not. We can now use these predictions to target the players who are expected to churn and entice them to continue playing the game.
+
+As you may have noticed, for both supervised and unsupervised learning, there are two tasks that need to be performed: attribute selection and model selection. Attribute selection (also called feature selection) pertains to selecting how we wish to represent the entity of interest, in this case, the player. Model selection, on the other hand, pertains to selecting the algorithm (and its parameters) that perform the task well. Both of these tasks are active areas of machine learning research and, in practice, require several iterations to achieve good performance.
+
+We now switch to reinforcement learning, the third class of machine learning algorithms, and arguably the one most relevant for the ML-Agents Toolkit.
+
+## Reinforcement Learning
+
+[Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning) can be viewed as a form of learning for sequential decision making that is commonly associated with controlling robots (but is, in fact, much more general). Consider an autonomous firefighting robot that is tasked with navigating into an area, finding the fire and neutralizing it. At any given moment, the robot perceives the environment through its sensors (e.g. camera, heat, touch), processes this information and produces an action (e.g. move to the left, rotate the water hose, turn on the water). In other words, it is continuously making decisions about how to interact in this environment given its view of the world (i.e. sensors input) and objective (i.e. neutralizing the fire). Teaching a robot to be a successful firefighting machine is precisely what reinforcement learning is designed to do.
+
+More specifically, the goal of reinforcement learning is to learn a **policy**, which is essentially a mapping from **observations** to **actions**. An observation is what the robot can measure from its **environment** (in this case, all its sensory inputs) and an action, in its most raw form, is a change to the configuration of the robot (e.g. position of its base, position of its water hose and whether the hose is on or off).
+
+The last remaining piece of the reinforcement learning task is the **reward signal**. The robot is trained to learn a policy that maximizes its overall rewards. When training a robot to be a mean firefighting machine, we provide it with rewards (positive and negative) indicating how well it is doing on completing the task. Note that the robot does not _know_ how to put out fires before it is trained. It learns the objective because it receives a large positive reward when it puts out the fire and a small negative reward for every passing second. The fact that rewards are sparse (i.e. may not be provided at every step, but only when a robot arrives at a success or failure situation), is a defining characteristic of reinforcement learning and precisely why learning good policies can be difficult (and/or time-consuming) for complex environments.
+
+<div style="text-align: center"><img src="images/rl_cycle.png" alt="The reinforcement learning lifecycle."></div>
+
+Learning a policy usually requires many trials and iterative policy updates. More specifically, the robot is placed in several fire situations and over time learns an optimal policy which allows it to put out fires more effectively. Obviously, we cannot expect to train a robot repeatedly in the real world, particularly when fires are involved. This is precisely why the use of Unity as a simulator serves as the perfect training grounds for learning such behaviors. While our discussion of reinforcement learning has centered around robots, there are strong parallels between robots and characters in a game. In fact, in many ways, one can view a non-playable character (NPC) as a virtual robot, with its own observations about the environment, its own set of actions and a specific objective. Thus it is natural to explore how we can train behaviors within Unity using reinforcement learning. This is precisely what the ML-Agents Toolkit offers. The video linked below includes a reinforcement learning demo showcasing training character behaviors using the ML-Agents Toolkit.
+
+<p align="center"> <a href="http://www.youtube.com/watch?feature=player_embedded&v=fiQsmdwEGT8" target="_blank"> <img src="http://img.youtube.com/vi/fiQsmdwEGT8/0.jpg" alt="RL Demo" width="400" border="10" /> </a> </p>
+
+Similar to both unsupervised and supervised learning, reinforcement learning also involves two tasks: attribute selection and model selection. Attribute selection is defining the set of observations for the robot that best help it complete its objective, while model selection is defining the form of the policy (mapping from observations to actions) and its parameters. In practice, training behaviors is an iterative process that may require changing the attribute and model choices.
+
+## Training and Inference
+
+One common aspect of all three branches of machine learning is that they all involve a **training phase** and an **inference phase**. While the details of the training and inference phases are different for each of the three, at a high-level, the training phase involves building a model using the provided data, while the inference phase involves applying this model to new, previously unseen, data. More specifically:
+
+- For our unsupervised learning example, the training phase learns the optimal two clusters based on the data describing existing players, while the inference phase assigns a new player to one of these two clusters.
+- For our supervised learning example, the training phase learns the mapping from player attributes to player label (whether they churned or not), and the inference phase predicts whether a new player will churn or not based on that learned mapping.
+- For our reinforcement learning example, the training phase learns the optimal policy through guided trials, and in the inference phase, the agent observes and takes actions in the wild using its learned policy.
+
+To briefly summarize: all three classes of algorithms involve training and inference phases in addition to attribute and model selections. What ultimately separates them is the type of data available to learn from. In unsupervised learning our data set was a collection of attributes, in supervised learning our data set was a collection of attribute-label pairs, and, lastly, in reinforcement learning our data set was a collection of observation-action-reward tuples.
+
+## Deep Learning
+
+[Deep learning](https://en.wikipedia.org/wiki/Deep_learning) is a family of algorithms that can be used to address any of the problems introduced above. More specifically, they can be used to solve both attribute and model selection tasks. Deep learning has gained popularity in recent years due to its outstanding performance on several challenging machine learning tasks. One example is [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo), a [computer Go](https://en.wikipedia.org/wiki/Computer_Go) program, that leverages deep learning, that was able to beat Lee Sedol (a Go world champion).
+
+A key characteristic of deep learning algorithms is their ability to learn very complex functions from large amounts of training data. This makes them a natural choice for reinforcement learning tasks when a large amount of data can be generated, say through the use of a simulator or engine such as Unity. By generating hundreds of thousands of simulations of the environment within Unity, we can learn policies for very complex environments (a complex environment is one where the number of observations an agent perceives and the number of actions they can take are large). Many of the algorithms we provide in ML-Agents use some form of deep learning, built on top of the open-source library, [PyTorch](Background-PyTorch.md).
diff --git a/com.unity.ml-agents/Documentation~/Background-PyTorch.md b/com.unity.ml-agents/Documentation~/Background-PyTorch.md
@@ -0,0 +1,11 @@
+# Background: PyTorch
+
+As discussed in our [machine learning background page](Background-Machine-Learning.md), many of the algorithms we provide in the ML-Agents Toolkit leverage some form of deep learning. More specifically, our implementations are built on top of the open-source library [PyTorch](https://pytorch.org/). In this page we provide a brief overview of PyTorch and TensorBoard that we leverage within the ML-Agents Toolkit.
+
+## PyTorch
+
+[PyTorch](https://pytorch.org/) is an open source library for performing computations using data flow graphs, the underlying representation of deep learning models. It facilitates training and inference on CPUs and GPUs in a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you train the behavior of an agent, the output is a model (.onnx) file that you can then associate with an Agent. Unless you implement a new algorithm, the use of PyTorch is mostly abstracted away and behind the scenes.
+
+## TensorBoard
+
+One component of training models with PyTorch is setting the values of certain model attributes (called _hyperparameters_). Finding the right values of these hyperparameters can require a few iterations. Consequently, we leverage a visualization tool called [TensorBoard](https://www.tensorflow.org/tensorboard). It allows the visualization of certain agent attributes (e.g. reward) throughout training which can be helpful in both building intuitions for the different hyperparameters and setting the optimal values for your Unity environment. We provide more details on setting the hyperparameters in the [Training ML-Agents](Training-ML-Agents.md) page. If you are unfamiliar with TensorBoard we recommend our guide on [using TensorBoard with ML-Agents](Using-Tensorboard.md) or this [tutorial](https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial).
diff --git a/com.unity.ml-agents/Documentation~/Background-Unity.md b/com.unity.ml-agents/Documentation~/Background-Unity.md
@@ -0,0 +1,14 @@
+# Background: Unity
+
+If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we highly recommend the [Unity Manual](https://docs.unity3d.com/Manual/index.html) and [Tutorials page](https://unity3d.com/learn/tutorials). The [Roll-a-ball tutorial](https://learn.unity.com/project/roll-a-ball) is a fantastic resource to learn all the basic concepts of Unity to get started with the ML-Agents Toolkit:
+
+- [Editor](https://docs.unity3d.com/Manual/sprite/sprite-editor/use-editor.html)
+- [Scene](https://docs.unity3d.com/Manual/CreatingScenes.html)
+- [GameObject](https://docs.unity3d.com/Manual/GameObjects.html)
+- [Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html)
+- [Camera](https://docs.unity3d.com/Manual/Cameras.html)
+- [Scripting](https://docs.unity3d.com/Manual/ScriptingSection.html)
+- [Physics](https://docs.unity3d.com/Manual/PhysicsSection.html)
+- [Ordering of event functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
+(e.g. FixedUpdate, Update)
+- [Prefabs](https://docs.unity3d.com/Manual/Prefabs.html)
diff --git a/com.unity.ml-agents/Documentation~/Background.md b/com.unity.ml-agents/Documentation~/Background.md
@@ -0,0 +1,11 @@
+# Background
+
+This section provides foundational knowledge to help you understand the technologies and concepts that power the ML-Agents Toolkit.
+
+| **Topic**                                                 | **Description**                                                               |
+|-----------------------------------------------------------|-------------------------------------------------------------------------------|
+| [Machine Learning](Background-Machine-Learning.md)        | Introduction to ML concepts, reinforcement learning, and training principles. |
+| [Unity](Background-Unity.md)                              | Unity fundamentals for ML-Agents development and environment creation.        |
+| [PyTorch](Background-PyTorch.md)                          | PyTorch basics for understanding the training pipeline and neural networks.   |
+| [Using Virtual Environment](Using-Virtual-Environment.md) | Setting up and managing Python virtual environments for ML-Agents.            |
+| [ELO Rating System](ELO-Rating-System.md)                 | Understanding ELO rating system for multi-agent training evaluation.          |