From 25757aaa04362f66ad07e44cd2bd3fa23dea4148 Mon Sep 17 00:00:00 2001
From: Penelope Yong <penelopeysm@gmail.com>
Date: Thu, 26 Jun 2025 11:37:23 +0100
Subject: [PATCH 1/4] Use Mooncake for Optimization.jl example

---
 usage/mode-estimation/index.qmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/usage/mode-estimation/index.qmd b/usage/mode-estimation/index.qmd
index 02607c6a8..9e1e9d9a6 100755
--- a/usage/mode-estimation/index.qmd
+++ b/usage/mode-estimation/index.qmd
@@ -71,14 +71,14 @@ The above are just two examples, Optimization.jl supports [many more](https://do
 We can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method
 
 ```{julia}
-using ADTypes: AutoReverseDiff
-import ReverseDiff
+import Mooncake
+
 maximum_likelihood(
-    model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff()
+    model, NelderMead(); initial_params=[0.1, 2], adtype=AutoMooncake(; config=nothing)
 )
 ```
 
-When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `s²` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, Turing.VarInfo(model))`.
+When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `s²` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, DynamicPPL.VarInfo(model))`.
 
 We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here's how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:
 

From 76a2cfa804558d3ceb071ed56f67a6429ac892df Mon Sep 17 00:00:00 2001
From: Penelope Yong <penelopeysm@gmail.com>
Date: Thu, 26 Jun 2025 11:46:45 +0100
Subject: [PATCH 2/4] Use Mooncake in pPCA

---
 tutorials/probabilistic-pca/index.qmd | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/tutorials/probabilistic-pca/index.qmd b/tutorials/probabilistic-pca/index.qmd
index 74fc011f5..9111e97f3 100755
--- a/tutorials/probabilistic-pca/index.qmd
+++ b/tutorials/probabilistic-pca/index.qmd
@@ -90,7 +90,7 @@ First, we load the dependencies used.
 
 ```{julia}
 using Turing
-using ReverseDiff
+using Mooncake
 using LinearAlgebra, FillArrays
 
 # Packages for visualization
@@ -108,7 +108,7 @@ You can install them via `using Pkg; Pkg.add("package_name")`.
 ::: {.callout-caution}
 ## Package usages:
 We use `DataFrames` for instantiating matrices, `LinearAlgebra` and `FillArrays` to perform matrix operations;
-`Turing` for model specification and MCMC sampling, `ReverseDiff` for setting the automatic differentiation backend when sampling.
+`Turing` for model specification and MCMC sampling, `Mooncake` for automatic differentiation when sampling.
 `StatsPlots` for visualising the resutls. `, Measures` for setting plot margin units.
 As all examples involve sampling, for reproducibility we set a fixed seed using the `Random` standard library.
 :::
@@ -194,8 +194,9 @@ Specifically:
 
 Here we aim to perform MCMC sampling to infer the projection matrix $\mathbf{W}_{D \times k}$, the latent variable matrix $\mathbf{Z}_{k \times N}$, and the offsets $\boldsymbol{\mu}_{N \times 1}$.
 
-We run the inference using the NUTS sampler, of which the chain length is set to be 500, target accept ratio 0.65 and initial stepsize 0.1. By default, the NUTS sampler samples 1 chain.
-You are free to try [different samplers](https://turinglang.org/stable/docs/library/#samplers).
+We run the inference using the NUTS sampler.
+By default, `sample` samples a single chain (in this case with 500 samples).
+You can also use [different samplers]({{< meta usage/sampler-visualisation >}}) if you wish.
 
 ```{julia}
 #| output: false
@@ -205,17 +206,21 @@ setprogress!(false)
 ```{julia}
 k = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice
 ppca = pPCA(mat_exp', k) # instantiate the probabilistic model
-chain_ppca = sample(ppca, NUTS(;adtype=AutoReverseDiff()), 500);
+chain_ppca = sample(ppca, NUTS(; adtype=AutoMooncake(; config=nothing)), 500);
 ```
 
-The samples are saved in the Chains struct `chain_ppca`, whose shape can be checked:
+The samples are saved in `chain_ppca`, which is an `MCMCChains.Chains` object.
+We can check its shape:
 
 ```{julia}
 size(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)
 ```
 
-The Chains struct `chain_ppca` also contains the sampling info such as r-hat, ess, mean estimates, etc.
-You can print it to check these quantities.
+Sampling statistics such as R-hat, ESS, mean estimates, and so on can also be obtained from this:
+
+```{julia}
+describe(chain_ppca)
+```
 
 #### Step 5: posterior predictive checks
 
@@ -280,7 +285,7 @@ Another way to put it: 2 dimensions is enough to capture the main structure of t
 A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data?
 This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space.
 In the case of PCA, there exist a lot of heuristics to make that choice.
-For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].
+For example, We can tune the number of principal components using empirical methods such as cross-validation based on some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].
 
 For p-PCA, this can be done in an elegant and principled way, using a technique called *Automatic Relevance Determination* (ARD).
 ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [^4].
@@ -315,7 +320,7 @@ We instantiate the model and ask Turing to sample from it using NUTS sampler. Th
 
 ```{julia}
 ppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model
-chain_ppcaARD = sample(ppca_ARD, NUTS(;adtype=AutoReverseDiff()), 500) # sampling
+chain_ppcaARD = sample(ppca_ARD, NUTS(; adtype=AutoMooncake(; config=nothing)), 500) # sampling
 plot(group(chain_ppcaARD, :α); margin=6.0mm)
 ```
 

From 179352f7e4bc49ec469b927cc0c09cca3b585d88 Mon Sep 17 00:00:00 2001
From: Penelope Yong <penelopeysm@gmail.com>
Date: Thu, 26 Jun 2025 11:46:53 +0100
Subject: [PATCH 3/4] Update a link to AD docs page

---
 tutorials/gaussian-processes-introduction/index.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tutorials/gaussian-processes-introduction/index.qmd b/tutorials/gaussian-processes-introduction/index.qmd
index 649f80484..fa64476d8 100755
--- a/tutorials/gaussian-processes-introduction/index.qmd
+++ b/tutorials/gaussian-processes-introduction/index.qmd
@@ -131,7 +131,7 @@ please let us know!
 Moving on, we generate samples from the posterior using the default `NUTS` sampler.
 We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has
 better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on
-this example. See Turing.jl's docs on Automatic Differentiation for more info.
+this example. See the [automatic differentiation docs]({{< meta usage/automatic-differentiation >}}) for more info.
 
 
 ```{julia}

From 8886213be4c1f280273c2dac5af81752a2facf63 Mon Sep 17 00:00:00 2001
From: Penelope Yong <penelopeysm@gmail.com>
Date: Thu, 26 Jun 2025 11:55:40 +0100
Subject: [PATCH 4/4] Fix bad shortcodes

---
 tutorials/gaussian-processes-introduction/index.qmd | 2 +-
 tutorials/probabilistic-pca/index.qmd               | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tutorials/gaussian-processes-introduction/index.qmd b/tutorials/gaussian-processes-introduction/index.qmd
index fa64476d8..6db5a1388 100755
--- a/tutorials/gaussian-processes-introduction/index.qmd
+++ b/tutorials/gaussian-processes-introduction/index.qmd
@@ -131,7 +131,7 @@ please let us know!
 Moving on, we generate samples from the posterior using the default `NUTS` sampler.
 We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has
 better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on
-this example. See the [automatic differentiation docs]({{< meta usage/automatic-differentiation >}}) for more info.
+this example. See the [automatic differentiation docs]({{< meta usage-automatic-differentiation >}}) for more info.
 
 
 ```{julia}
diff --git a/tutorials/probabilistic-pca/index.qmd b/tutorials/probabilistic-pca/index.qmd
index 9111e97f3..106bb965f 100755
--- a/tutorials/probabilistic-pca/index.qmd
+++ b/tutorials/probabilistic-pca/index.qmd
@@ -196,7 +196,7 @@ Here we aim to perform MCMC sampling to infer the projection matrix $\mathbf{W}_
 
 We run the inference using the NUTS sampler.
 By default, `sample` samples a single chain (in this case with 500 samples).
-You can also use [different samplers]({{< meta usage/sampler-visualisation >}}) if you wish.
+You can also use [different samplers]({{< meta usage-sampler-visualisation >}}) if you wish.
 
 ```{julia}
 #| output: false