-
Notifications
You must be signed in to change notification settings - Fork 48
Quality of life: Added custom log_reward option to GFlowNets. #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @Idriss-Malek, thank you for submitting this PR! Your current implementation adds arguments for the intrinsic rewards, which is clear, but I wonder if this is the best approach. What about creating a wrapper of env that enables adding the intrinsic reward to the original reward? |
Hi @hyeok9855
Correct me if I misunderstood your message !! |
If we implement this as a wrapper that takes the original environment as an argument, then I think this would simply be 10 wrappers with a single custom environment? Maybe we should gather more opinions to make a decision on this. @josephdviviano, @younik any thoughts? |
I also favor the wrapper approach, assuming that the intrinsic reward depends only on the state of the environment, and not on the state of the net |
The intrinsic reward doesn't always depend on the env. In https://openreview.net/forum?id=HH4KWP8RP5 for example, the intrinsic reward used to train a secondary gfn depends on the primary gfn. |
If this is fixed during the second training, i.e. you train one GFlowNet, and then you can create a wrapper based on it to train a second GFlowNet, the wrapper approach is still fine. However, it is hard to decide the best implementation without a concrete use case, so maybe we should create an example/tutorial for a common use case? |
Forward looking GFN reference: https://arxiv.org/pdf/2302.01687 |
From eqn2 in your paper - it appears that the intrinsic reward and extrinsic reward are simply two independent terms which are summed. Is it possible to always compute this intrinsic reward independently of the environment? I would hope so because torchgfn environments are stateless. Ideally, the policy would take in the current state of the environment and compute the intrinsic reward from that. So This completely avoids the cartesian product of intrinsic x extrinsic reward types. LMK if I misunderstand something. |
I agree. @Idriss-Malek, could you share an example that requires intrinsic reward if you have one? |
Thanks @Idriss-Malek for the PR. I like the added flexbility. Before merging, it would be better to have some small example, as a new script or as an added argument to the existing scripts, that showcases the utility of this new feature. Can you think of some minimal example? |
@saleml What's the status of this PR? Can we help in any way? |
This pull request adds support for using custom rewards in GFlowNet implementations, enabling the use of intrinsic rewards for example (https://openreview.net/forum?id=HH4KWP8RP5).
Changes
Added a
log_rewards
argument to GFlowNet implementations.None
, in which case the environment's reward is used. This preserves compatibility with existing code.log_rewards
is expected to be a tensor of the same shape as the container (e.g.,(n_trajectories,)
,(n_transitions,)
, etc.), and it's used instead of the environment reward.In the DB and SubTB losses, I did not modify the forward-looking setup, except to rename the argument to
fl_log_rewards
to avoid naming conflicts.