Skip to content

Replace some ReverseDiffs with Mooncake #623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tutorials/gaussian-processes-introduction/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ please let us know!
Moving on, we generate samples from the posterior using the default `NUTS` sampler.
We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has
better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on
this example. See Turing.jl's docs on Automatic Differentiation for more info.
this example. See the [automatic differentiation docs]({{< meta usage-automatic-differentiation >}}) for more info.


```{julia}
Expand Down
25 changes: 15 additions & 10 deletions tutorials/probabilistic-pca/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ First, we load the dependencies used.

```{julia}
using Turing
using ReverseDiff
using Mooncake
using LinearAlgebra, FillArrays

# Packages for visualization
Expand All @@ -108,7 +108,7 @@ You can install them via `using Pkg; Pkg.add("package_name")`.
::: {.callout-caution}
## Package usages:
We use `DataFrames` for instantiating matrices, `LinearAlgebra` and `FillArrays` to perform matrix operations;
`Turing` for model specification and MCMC sampling, `ReverseDiff` for setting the automatic differentiation backend when sampling.
`Turing` for model specification and MCMC sampling, `Mooncake` for automatic differentiation when sampling.
`StatsPlots` for visualising the resutls. `, Measures` for setting plot margin units.
As all examples involve sampling, for reproducibility we set a fixed seed using the `Random` standard library.
:::
Expand Down Expand Up @@ -194,8 +194,9 @@ Specifically:

Here we aim to perform MCMC sampling to infer the projection matrix $\mathbf{W}_{D \times k}$, the latent variable matrix $\mathbf{Z}_{k \times N}$, and the offsets $\boldsymbol{\mu}_{N \times 1}$.

We run the inference using the NUTS sampler, of which the chain length is set to be 500, target accept ratio 0.65 and initial stepsize 0.1. By default, the NUTS sampler samples 1 chain.
You are free to try [different samplers](https://turinglang.org/stable/docs/library/#samplers).
We run the inference using the NUTS sampler.
By default, `sample` samples a single chain (in this case with 500 samples).
You can also use [different samplers]({{< meta usage-sampler-visualisation >}}) if you wish.

```{julia}
#| output: false
Expand All @@ -205,17 +206,21 @@ setprogress!(false)
```{julia}
k = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice
ppca = pPCA(mat_exp', k) # instantiate the probabilistic model
chain_ppca = sample(ppca, NUTS(;adtype=AutoReverseDiff()), 500);
chain_ppca = sample(ppca, NUTS(; adtype=AutoMooncake(; config=nothing)), 500);
```

The samples are saved in the Chains struct `chain_ppca`, whose shape can be checked:
The samples are saved in `chain_ppca`, which is an `MCMCChains.Chains` object.
We can check its shape:

```{julia}
size(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)
```

The Chains struct `chain_ppca` also contains the sampling info such as r-hat, ess, mean estimates, etc.
You can print it to check these quantities.
Sampling statistics such as R-hat, ESS, mean estimates, and so on can also be obtained from this:

```{julia}
describe(chain_ppca)
```

#### Step 5: posterior predictive checks

Expand Down Expand Up @@ -280,7 +285,7 @@ Another way to put it: 2 dimensions is enough to capture the main structure of t
A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data?
This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space.
In the case of PCA, there exist a lot of heuristics to make that choice.
For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].
For example, We can tune the number of principal components using empirical methods such as cross-validation based on some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].

For p-PCA, this can be done in an elegant and principled way, using a technique called *Automatic Relevance Determination* (ARD).
ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [^4].
Expand Down Expand Up @@ -315,7 +320,7 @@ We instantiate the model and ask Turing to sample from it using NUTS sampler. Th

```{julia}
ppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model
chain_ppcaARD = sample(ppca_ARD, NUTS(;adtype=AutoReverseDiff()), 500) # sampling
chain_ppcaARD = sample(ppca_ARD, NUTS(; adtype=AutoMooncake(; config=nothing)), 500) # sampling
plot(group(chain_ppcaARD, :α); margin=6.0mm)
```

Expand Down
8 changes: 4 additions & 4 deletions usage/mode-estimation/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,14 @@ The above are just two examples, Optimization.jl supports [many more](https://do
We can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method

```{julia}
using ADTypes: AutoReverseDiff
import ReverseDiff
import Mooncake

maximum_likelihood(
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff()
model, NelderMead(); initial_params=[0.1, 2], adtype=AutoMooncake(; config=nothing)
)
```

When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `s²` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, Turing.VarInfo(model))`.
When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `s²` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, DynamicPPL.VarInfo(model))`.

We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here's how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:

Expand Down