Quality of life: Added custom log_reward option to GFlowNets. #312

Idriss-Malek · 2025-05-01T12:22:34Z

This pull request adds support for using custom rewards in GFlowNet implementations, enabling the use of intrinsic rewards for example (https://openreview.net/forum?id=HH4KWP8RP5).

Changes

Added a log_rewards argument to GFlowNet implementations.
- By default, it is set to None, in which case the environment's reward is used. This preserves compatibility with existing code.
- When provided, log_rewards is expected to be a tensor of the same shape as the container (e.g., (n_trajectories,), (n_transitions,), etc.), and it's used instead of the environment reward.
In the DB and SubTB losses, I did not modify the forward-looking setup, except to rename the argument to fl_log_rewards to avoid naming conflicts.
- I am not sure what this forward-looking setup does exactly (would appreciate any pointers to the original reference).
- For now, I recommend using custom rewards with caution when using the forward-looking setup.

hyeok9855 · 2025-05-28T14:45:15Z

Hi @Idriss-Malek, thank you for submitting this PR!

Your current implementation adds arguments for the intrinsic rewards, which is clear, but I wonder if this is the best approach. What about creating a wrapper of env that enables adding the intrinsic reward to the original reward?

Idriss-Malek · 2025-05-28T15:06:44Z

Hi @hyeok9855
Currently, GFNs use trajectories.log_reward to compute the loss. To use intrinsic reward, I see three solutions:

Since during sampling, the reward is assigned to the trajectory by seeing what is the reward in the environment, changing the env's reward is a possible solution. However, the loss will still be computed with trajectories.log_reward which is equal to env.log_rewards(trajectories.terminal_state), which means that If we use 10 GFlowNets with 10 different intrinsic rewards, we will have 10 custom envs.
Directly changing the assigned reward in the trajectory, but I feel this is awkward.
Directly specifying which reward to use the moment we compute the loss (what I did).

Correct me if I misunderstood your message !!

hyeok9855 · 2025-05-28T15:27:00Z

If we use 10 GFlowNets with 10 different intrinsic rewards, we will have 10 custom envs.

If we implement this as a wrapper that takes the original environment as an argument, then I think this would simply be 10 wrappers with a single custom environment?

Maybe we should gather more opinions to make a decision on this. @josephdviviano, @younik any thoughts?

younik · 2025-05-28T16:15:03Z

If we use 10 GFlowNets with 10 different intrinsic rewards, we will have 10 custom envs.

If we implement this as a wrapper that takes the original environment as an argument, then I think this would simply be 10 wrappers with a single custom environment?

Maybe we should gather more opinions to make a decision on this. @josephdviviano, @younik any thoughts?

I also favor the wrapper approach, assuming that the intrinsic reward depends only on the state of the environment, and not on the state of the net

Idriss-Malek · 2025-05-28T16:23:21Z

If we use 10 GFlowNets with 10 different intrinsic rewards, we will have 10 custom envs.

If we implement this as a wrapper that takes the original environment as an argument, then I think this would simply be 10 wrappers with a single custom environment?
Maybe we should gather more opinions to make a decision on this. @josephdviviano, @younik any thoughts?

I also favor the wrapper approach, assuming that the intrinsic reward depends only on the state of the environment, and not on the state of the net

The intrinsic reward doesn't always depend on the env. In https://openreview.net/forum?id=HH4KWP8RP5 for example, the intrinsic reward used to train a secondary gfn depends on the primary gfn.

younik · 2025-05-28T16:29:18Z

If we use 10 GFlowNets with 10 different intrinsic rewards, we will have 10 custom envs.

If we implement this as a wrapper that takes the original environment as an argument, then I think this would simply be 10 wrappers with a single custom environment?
Maybe we should gather more opinions to make a decision on this. @josephdviviano, @younik any thoughts?

I also favor the wrapper approach, assuming that the intrinsic reward depends only on the state of the environment, and not on the state of the net

The intrinsic reward doesn't always depend on the env. In https://openreview.net/forum?id=HH4KWP8RP5 for example, the intrinsic reward used to train a secondary gfn depends on the primary gfn.

If this is fixed during the second training, i.e. you train one GFlowNet, and then you can create a wrapper based on it to train a second GFlowNet, the wrapper approach is still fine.

However, it is hard to decide the best implementation without a concrete use case, so maybe we should create an example/tutorial for a common use case?

josephdviviano · 2025-05-28T17:17:16Z

Forward looking GFN reference: https://arxiv.org/pdf/2302.01687

josephdviviano · 2025-05-28T17:23:19Z

From eqn2 in your paper - it appears that the intrinsic reward and extrinsic reward are simply two independent terms which are summed.

Is it possible to always compute this intrinsic reward independently of the environment? I would hope so because torchgfn environments are stateless. Ideally, the policy would take in the current state of the environment and compute the intrinsic reward from that. So trajectories.log_rewards would simply accept as argument both the policy (for intrinsic reward computation - optionally) and the environment (for extrinsic reward computation).

This completely avoids the cartesian product of intrinsic x extrinsic reward types.

LMK if I misunderstand something.

hyeok9855 · 2025-05-28T18:46:03Z

However, it is hard to decide the best implementation without a concrete use case, so maybe we should create an example/tutorial for a common use case?

I agree. @Idriss-Malek, could you share an example that requires intrinsic reward if you have one?

saleml · 2025-06-27T15:45:56Z

Thanks @Idriss-Malek for the PR. I like the added flexbility. Before merging, it would be better to have some small example, as a new script or as an added argument to the existing scripts, that showcases the utility of this new feature. Can you think of some minimal example?

josephdviviano · 2025-09-24T16:00:00Z

@saleml What's the status of this PR? Can we help in any way?

Added custom log_reward option to GFlowNets.

5b112f2

Idriss-Malek changed the title ~~Added custom log_reward option to GFlowNets.~~ Quality of life: Added custom log_reward option to GFlowNets. May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality of life: Added custom log_reward option to GFlowNets. #312

Quality of life: Added custom log_reward option to GFlowNets. #312

Uh oh!

Idriss-Malek commented May 1, 2025

Uh oh!

hyeok9855 commented May 28, 2025

Uh oh!

Idriss-Malek commented May 28, 2025

Uh oh!

hyeok9855 commented May 28, 2025

Uh oh!

younik commented May 28, 2025

Uh oh!

Idriss-Malek commented May 28, 2025

Uh oh!

younik commented May 28, 2025

Uh oh!

josephdviviano commented May 28, 2025

Uh oh!

josephdviviano commented May 28, 2025

Uh oh!

hyeok9855 commented May 28, 2025

Uh oh!

saleml commented Jun 27, 2025

Uh oh!

josephdviviano commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Quality of life: Added custom log_reward option to GFlowNets. #312

Are you sure you want to change the base?

Quality of life: Added custom log_reward option to GFlowNets. #312

Uh oh!

Conversation

Idriss-Malek commented May 1, 2025

Changes

Uh oh!

hyeok9855 commented May 28, 2025

Uh oh!

Idriss-Malek commented May 28, 2025

Uh oh!

hyeok9855 commented May 28, 2025

Uh oh!

younik commented May 28, 2025

Uh oh!

Idriss-Malek commented May 28, 2025

Uh oh!

younik commented May 28, 2025

Uh oh!

josephdviviano commented May 28, 2025

Uh oh!

josephdviviano commented May 28, 2025

Uh oh!

hyeok9855 commented May 28, 2025

Uh oh!

saleml commented Jun 27, 2025

Uh oh!

josephdviviano commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants