hypergrid refactor #402

josephdviviano · 2025-10-01T06:06:29Z

I've read the .github/CONTRIBUTING.md file
My code follows the typing guidelines
I've added appropriate tests
I've run pre-commit hooks locally

Description

Moves key features involving distributed computing to gfn.utils.distributed.
Moves Hypergrid helper methods out of a dedicated folder into the main file.
Moves mode-counting logic to within the Hypergrid env, to allow for shared validation across all hypergrid scripts.
Moves standard DiscreteEnvironment evaluation code to the base class.
Adds strategy sampling to train_hypergrid.
Changes the training loop to support unique strategies per-agent, and agent weight restarting.

…d env class

…to do parameter resetting. awating integrations for selective averaging and for performance monitoring. bugfix in distributed.py

younik

I didn't review in detail, as it is hard for 1000+ LOC, assuming all of it comes from refactoring (only moving functions around).

I am not sure it is the case for sampling strategy, if you want a review for them consider moving it in another PR.

younik · 2025-10-04T17:35:14Z

src/gfn/gym/hypergrid.py

+    # -------------------------
+    # Mode utilities
+    # -------------------------
+    def _mode_reward_threshold(self) -> float:


this method could be moved inside GridReward object and use polymorphism instead of the if chain

josephdviviano · 2025-10-04T18:04:43Z

The sampling strategy PR should be merged first. It is a standalone PR Joseph (Mobile)

…

On Sat, Oct 4, 2025 at 13:47 Omar Younis ***@***.***> wrote: ***@***.**** approved this pull request. I didn't review in detail, as it is hard for 1000+ LOC, assuming all of it comes from refactoring (only moving functions around). I am not sure it is the case for sampling strategy, if you want a review for them consider moving it in another PR. ------------------------------ In src/gfn/gym/hypergrid.py <#402 (comment)>: > @@ -210,6 +225,106 @@ def reward(self, states: DiscreteStates) -> torch.Tensor: ), f"reward.shape is {reward.shape} and states.batch_shape is {states.batch_shape}" return reward + # ------------------------- + # Mode utilities + # ------------------------- + def _mode_reward_threshold(self) -> float: this method could be moved inside GridReward object and use polymorphism instead of the if chain — Reply to this email directly, view it on GitHub <#402 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7TL2XX6Z7O5TG2WFDN43L3WABZ5AVCNFSM6AAAAACH7AVDLGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGMBSGAZDKNZTGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

hyeok9855

I reviewed only env.py and gym, and left some comments. Thanks you!

hyeok9855 · 2025-10-07T11:29:22Z

src/gfn/gym/hypergrid.py

+        """
+        rewards = self.reward(states)
+        threshold = self._mode_reward_threshold()
+        return rewards >= threshold


We may want to subtract epsilon > 0, i.e., rewards >= threshold - EPS, to avoid floating point error. It happens sometimes (but I have no idea exactly when it does).

hyeok9855 · 2025-10-07T11:34:29Z

src/gfn/gym/hypergrid.py

+        # Determine side relative to the center along each dimension
+        pos = (
+            states.tensor.to(dtype=torch.get_default_dtype()) / (self.height - 1) > 0.5
+        ).long()
+        weights = (
+            2 ** torch.arange(self.ndim - 1, -1, -1, device=states.tensor.device)
+        ).long()
+        ids = (pos * weights).sum(dim=-1)
+        # Assign -1 to non-mode states
+        ids = torch.where(mask, ids, torch.full_like(ids, -1))
+
+        return ids


I don't understand what's happening here. I guess this might only be valid for specific types of GridReward.

Why do we need mode_ids or modes_found?

I guess the number of modes found can be obtained by len(modes_visited) where modes_visited is a set of states with rewards higher than a threshold.

josephdviviano · 2025-10-07T14:38:52Z

tutorials/examples/train_hypergrid.py

+            buf /= N - G
+            t.data.copy_(buf)
+
+    return True


josephdviviano · 2025-10-07T14:40:00Z

tutorials/examples/train_hypergrid.py

+        if init_mode is not None
+        else getattr(args, "restart_init_mode", "random")
+    )
+    if mode == "mean_others":


Needs to be replaced with Omar's work - merged here #401

josephdviviano · 2025-10-09T17:01:17Z

tutorials/examples/train_hypergrid.py

-        if getattr(args, "use_restarts", False) and iteration >= 1000:
-            agent_gid = distributed_context.agent_group_id or 0
-            if _group_random_coin(1.0 / 1000.0, agent_gid, iteration, args.seed):
-                prev_eps = float(getattr(args, "agent_epsilon", 0.0))


@younik you removed the strategy sampling logic?

Right, now it is sampled only at the beginning. I am gonna sample it again at every model build

OK great - it's not added yet I suppose?

We'll need that for this PR. As it stands, the work has simply been deleted.

josephdviviano · 2025-10-09T17:01:49Z

tutorials/examples/train_hypergrid.py


-    # 2. Build the initial gflownet via the same pathway as restarts (unified behavior).
-    # We explicitly pick random parameter init for the initial build (no averaging).
-


Please put back the strategy sampling logic.

josephdviviano · 2025-10-09T17:02:12Z

tutorials/examples/train_hypergrid.py

-    parameters from other ranks (keeping noisy sigmas/defaults intact), and
-    returns the new model and optimizer.
-    """
-    args.agent_epsilon = float(strategy.get("epsilon", 0.0))


Please put back the strategy sampling logic.

josephdviviano · 2025-10-09T17:02:52Z

tutorials/examples/train_hypergrid.py

-    return params
-
-
-def _canonical_param_tensors_for_gflownet(


Please ensure your changes take into account the types of parameters used for NoisyLayers

isn't model.parameters() enough?

Let's say you have a community of 5 agents.

3 of them are MLPs only, and 2 of them have noisy layers (which have different parameters, as shown here).

How do you do selective averaging of their parameters?

We'll probably have to just randomly initialize the elements of the noisy layers if there's nothing to average over in the community, but I think we'll need some logic to handle it.

Oh I see.
Tbh, I don't think it is a good idea to average parameters from different type of layers. It is already not sure if averaging weights from same type of policy works.

I think we'll need some logic to handle it.

I suggest to implement this when we will need, and in a separate PR. This PR already does multiple stuffs other than refactoring

ok good idea.

if you look at how the layers are implemented, you would simply average over the weight that are shared between the noisy / non-noisy policies and randomly initialize the rest and/or average over the noisylayer specific parameters which are present in the subcommunity you are averaging over. but it's not that important (it's just another source of randomness).

josephdviviano · 2025-10-09T17:03:50Z

tutorials/examples/train_hypergrid.py

    return strat


-def _reset_module_parameters_inplace(root: torch.nn.Module) -> None:


Can we still randomize the parameters of a policy instead of averaging?

When this should happen? We can easily add it as alternative to average in the SpawnPolicy

it would happen when spawning a policy. We will need to be able to configure it to happen under a few different conditions (we need to be able to do ablations and compare elements of our approach)

Okay, I have added a new argument to enable it.

tutorials/examples/train_hypergrid.py

josephdviviano · 2025-10-09T18:22:21Z

Thanks yes we need to be able to sample a new strategy every new agent Joseph (Mobile)

…

On Thu, Oct 9, 2025 at 14:20 Omar Younis ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tutorials/examples/train_hypergrid.py <#402 (comment)>: > @@ -880,37 +680,7 @@ def cleanup(): if iteration >= 1 + 1 + keep_active: break - # Optional agent restart: after 1000 iterations, 1/1000 chance each step. - if getattr(args, "use_restarts", False) and iteration >= 1000: - agent_gid = distributed_context.agent_group_id or 0 - if _group_random_coin(1.0 / 1000.0, agent_gid, iteration, args.seed): - prev_eps = float(getattr(args, "agent_epsilon", 0.0)) - prev_temp = float(getattr(args, "agent_temperature", 1.0)) - prev_noisy = int(getattr(args, "agent_n_noisy_layers", 0)) - if getattr(args, "use_random_strategies", False): - new_strat = _sample_new_strategy( - args, agent_gid, iteration, prev_eps, prev_temp, prev_noisy - ) - else: - # Keep the same exploration strategy; only weights are reinitialized. - new_strat = { Right, now it is sampled only at the beginning. I am gonna sample it again at every model build — Reply to this email directly, view it on GitHub <#402 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7TL2QYX6YIHKH47ASYFUT3W2RQDAVCNFSM6AAAAACH7AVDLGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGMRQGE2DMMRXGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

josephdviviano added 6 commits September 30, 2025 23:27

centralized evaluation methods under DiscreteStates and into HyperGri…

0ff49b7

…d env class

added the distributed utilities into the utils module

9c3aec3

removed empty file

b949d1d

moved tests to it's own folder

7ff4480

removed comment

75181e0

added the ability to sample a novel strategy for a given policy, and …

3f7385c

…to do parameter resetting. awating integrations for selective averaging and for performance monitoring. bugfix in distributed.py

josephdviviano requested review from hyeok9855, saleml and younik October 1, 2025 06:06

josephdviviano self-assigned this Oct 1, 2025

merge

2c1a993

younik approved these changes Oct 4, 2025

View reviewed changes

hyeok9855 reviewed Oct 7, 2025

View reviewed changes

josephdviviano commented Oct 7, 2025

View reviewed changes

tutorials/examples/train_hypergrid.py Outdated

buf /= N - G

t.data.copy_(buf)

return True

Copy link

Collaborator Author

josephdviviano Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove.

josephdviviano commented Oct 7, 2025

View reviewed changes

younik added 5 commits October 7, 2025 19:12

Merge branch 'master' into hypergrid_refactor

bd3eaf9

rebuild model and optimizer when doing spawn

e6cc76e

remove DDP

2bc60ab

remove collective communication

f7ad6a7

fix pre-commit

6d119c0

josephdviviano commented Oct 9, 2025

View reviewed changes

younik added 4 commits October 11, 2025 08:04

fix diversity score

31519df

always sample strategy

af0a048

add random init strategy

02e6d0b

fix typing

c246bdb


		# 2. Build the initial gflownet via the same pathway as restarts (unified behavior).
		# We explicitly pick random parameter init for the initial build (no averaging).

		return strat


		def _reset_module_parameters_inplace(root: torch.nn.Module) -> None:

hypergrid refactor #402

Are you sure you want to change the base?

hypergrid refactor #402

Uh oh!

Conversation

josephdviviano commented Oct 1, 2025

Description

Uh oh!

younik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josephdviviano commented Oct 4, 2025 via email

Uh oh!

hyeok9855 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josephdviviano Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

josephdviviano commented Oct 9, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

josephdviviano Oct 7, 2025 •

edited

Loading