Replies: 4 comments 3 replies
-
Hi @mathDR ! It's a cool idea. Could you sketch out the proposed implementation? Depending on the interface complexity and computational complexity (we probably want to mostly target first-order optimizers), it could be a cool addition to optax. |
Beta Was this translation helpful? Give feedback.
-
Okay yeah it might be good to discuss this here. There are currently two approaches I am considering: the generic solution and a very bespoke targeted solution. To formulate the problem (taking liberally from Salimbeni, et.al. Section 2.1): where We seek to maximize the evidence lower bound (ELBO) of an approximate posterior The exponential family is defined as where We will use a smooth invertible transformation The ELBO is thusly defined as So we want to minimize We will do so by finding a sequence of parameters where We can choose to do:
So for Natural Gradient Descent, if the parameters come from distributions, the using the Euclidean norm (where |
Beta Was this translation helpful? Give feedback.
-
@rdyro They point to some good blog posts by Agustinus Kristiadi: So we could build an |
Beta Was this translation helpful? Give feedback.
-
I really appreciate the explanations! I think this is a good candidate for optax, especially if the interface can be made concise. I'd say the initial focus could probably be on distributions with a diagonal Fisher information matrix since we'd maintain linear scaling with parameter numbers. Would this be too restrictive? Later, adding some form of matrix-inverse-vector product as a callback would probably work well. I like your suggestion. I'd be happy to include NGD in optax in some form! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a question regarding optax scope: When optimizing sparse variational gaussian processes, typically the dual parameterization of Adam et.al is used.
GPFlow has a class that does this (extends a
tf_keras
optimizer)It would be great for my workflow to be able to call
optax
to fit these models (Sparse Variational GPs).My question is: would an optimizer of this type be in scope of the
optax
project? I am happy to implement a PR if so.Please advise!
Beta Was this translation helpful? Give feedback.
All reactions