-
Notifications
You must be signed in to change notification settings - Fork 48
hypergrid refactor #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
hypergrid refactor #402
Conversation
…to do parameter resetting. awating integrations for selective averaging and for performance monitoring. bugfix in distributed.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review in detail, as it is hard for 1000+ LOC, assuming all of it comes from refactoring (only moving functions around).
I am not sure it is the case for sampling strategy, if you want a review for them consider moving it in another PR.
# ------------------------- | ||
# Mode utilities | ||
# ------------------------- | ||
def _mode_reward_threshold(self) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method could be moved inside GridReward object and use polymorphism instead of the if chain
The sampling strategy PR should be merged first. It is a standalone PR
Joseph (Mobile)
…On Sat, Oct 4, 2025 at 13:47 Omar Younis ***@***.***> wrote:
***@***.**** approved this pull request.
I didn't review in detail, as it is hard for 1000+ LOC, assuming all of it
comes from refactoring (only moving functions around).
I am not sure it is the case for sampling strategy, if you want a review
for them consider moving it in another PR.
------------------------------
In src/gfn/gym/hypergrid.py
<#402 (comment)>:
> @@ -210,6 +225,106 @@ def reward(self, states: DiscreteStates) -> torch.Tensor:
), f"reward.shape is {reward.shape} and states.batch_shape is {states.batch_shape}"
return reward
+ # -------------------------
+ # Mode utilities
+ # -------------------------
+ def _mode_reward_threshold(self) -> float:
this method could be moved inside GridReward object and use polymorphism
instead of the if chain
—
Reply to this email directly, view it on GitHub
<#402 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7TL2XX6Z7O5TG2WFDN43L3WABZ5AVCNFSM6AAAAACH7AVDLGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGMBSGAZDKNZTGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed only env.py and gym, and left some comments. Thanks you!
""" | ||
rewards = self.reward(states) | ||
threshold = self._mode_reward_threshold() | ||
return rewards >= threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to subtract epsilon > 0, i.e., rewards >= threshold - EPS
, to avoid floating point error. It happens sometimes (but I have no idea exactly when it does).
# Determine side relative to the center along each dimension | ||
pos = ( | ||
states.tensor.to(dtype=torch.get_default_dtype()) / (self.height - 1) > 0.5 | ||
).long() | ||
weights = ( | ||
2 ** torch.arange(self.ndim - 1, -1, -1, device=states.tensor.device) | ||
).long() | ||
ids = (pos * weights).sum(dim=-1) | ||
# Assign -1 to non-mode states | ||
ids = torch.where(mask, ids, torch.full_like(ids, -1)) | ||
|
||
return ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what's happening here. I guess this might only be valid for specific types of GridReward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need mode_ids
or modes_found
?
I guess the number of modes found can be obtained by len(modes_visited)
where modes_visited
is a set of states with rewards higher than a threshold.
buf /= N - G | ||
t.data.copy_(buf) | ||
|
||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove.
if init_mode is not None | ||
else getattr(args, "restart_init_mode", "random") | ||
) | ||
if mode == "mean_others": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be replaced with Omar's work - merged here #401
if getattr(args, "use_restarts", False) and iteration >= 1000: | ||
agent_gid = distributed_context.agent_group_id or 0 | ||
if _group_random_coin(1.0 / 1000.0, agent_gid, iteration, args.seed): | ||
prev_eps = float(getattr(args, "agent_epsilon", 0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younik you removed the strategy sampling logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, now it is sampled only at the beginning. I am gonna sample it again at every model build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great - it's not added yet I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need that for this PR. As it stands, the work has simply been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!!
|
||
# 2. Build the initial gflownet via the same pathway as restarts (unified behavior). | ||
# We explicitly pick random parameter init for the initial build (no averaging). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put back the strategy sampling logic.
parameters from other ranks (keeping noisy sigmas/defaults intact), and | ||
returns the new model and optimizer. | ||
""" | ||
args.agent_epsilon = float(strategy.get("epsilon", 0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put back the strategy sampling logic.
return params | ||
|
||
|
||
def _canonical_param_tensors_for_gflownet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure your changes take into account the types of parameters used for NoisyLayers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't model.parameters()
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say you have a community of 5 agents.
3 of them are MLPs only, and 2 of them have noisy layers (which have different parameters, as shown here).
How do you do selective averaging of their parameters?
We'll probably have to just randomly initialize the elements of the noisy layers if there's nothing to average over in the community, but I think we'll need some logic to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see.
Tbh, I don't think it is a good idea to average parameters from different type of layers. It is already not sure if averaging weights from same type of policy works.
I think we'll need some logic to handle it.
I suggest to implement this when we will need, and in a separate PR. This PR already does multiple stuffs other than refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok good idea.
if you look at how the layers are implemented, you would simply average over the weight that are shared between the noisy / non-noisy policies and randomly initialize the rest and/or average over the noisylayer specific parameters which are present in the subcommunity you are averaging over. but it's not that important (it's just another source of randomness).
return strat | ||
|
||
|
||
def _reset_module_parameters_inplace(root: torch.nn.Module) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still randomize the parameters of a policy instead of averaging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this should happen? We can easily add it as alternative to average in the SpawnPolicy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would happen when spawning a policy. We will need to be able to configure it to happen under a few different conditions (we need to be able to do ablations and compare elements of our approach)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have added a new argument to enable it.
Thanks yes we need to be able to sample a new strategy every new agent
Joseph (Mobile)
…On Thu, Oct 9, 2025 at 14:20 Omar Younis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tutorials/examples/train_hypergrid.py
<#402 (comment)>:
> @@ -880,37 +680,7 @@ def cleanup():
if iteration >= 1 + 1 + keep_active:
break
- # Optional agent restart: after 1000 iterations, 1/1000 chance each step.
- if getattr(args, "use_restarts", False) and iteration >= 1000:
- agent_gid = distributed_context.agent_group_id or 0
- if _group_random_coin(1.0 / 1000.0, agent_gid, iteration, args.seed):
- prev_eps = float(getattr(args, "agent_epsilon", 0.0))
- prev_temp = float(getattr(args, "agent_temperature", 1.0))
- prev_noisy = int(getattr(args, "agent_n_noisy_layers", 0))
- if getattr(args, "use_random_strategies", False):
- new_strat = _sample_new_strategy(
- args, agent_gid, iteration, prev_eps, prev_temp, prev_noisy
- )
- else:
- # Keep the same exploration strategy; only weights are reinitialized.
- new_strat = {
Right, now it is sampled only at the beginning. I am gonna sample it again
at every model build
—
Reply to this email directly, view it on GitHub
<#402 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7TL2QYX6YIHKH47ASYFUT3W2RQDAVCNFSM6AAAAACH7AVDLGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGMRQGE2DMMRXGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Description
gfn.utils.distributed
.Hypergrid
helper methods out of a dedicated folder into the main file.Hypergrid
env, to allow for shared validation across all hypergrid scripts.DiscreteEnvironment
evaluation code to the base class.