Skip to content

Conversation

josephdviviano
Copy link
Collaborator

  • I've read the .github/CONTRIBUTING.md file
  • My code follows the typing guidelines
  • I've added appropriate tests
  • I've run pre-commit hooks locally

Description

  • Moves key features involving distributed computing to gfn.utils.distributed.
  • Moves Hypergrid helper methods out of a dedicated folder into the main file.
  • Moves mode-counting logic to within the Hypergrid env, to allow for shared validation across all hypergrid scripts.
  • Moves standard DiscreteEnvironment evaluation code to the base class.
  • Adds strategy sampling to train_hypergrid.
  • Changes the training loop to support unique strategies per-agent, and agent weight restarting.

@josephdviviano josephdviviano self-assigned this Oct 1, 2025
Copy link
Collaborator

@younik younik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review in detail, as it is hard for 1000+ LOC, assuming all of it comes from refactoring (only moving functions around).

I am not sure it is the case for sampling strategy, if you want a review for them consider moving it in another PR.

# -------------------------
# Mode utilities
# -------------------------
def _mode_reward_threshold(self) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method could be moved inside GridReward object and use polymorphism instead of the if chain

@josephdviviano
Copy link
Collaborator Author

josephdviviano commented Oct 4, 2025 via email

Copy link
Collaborator

@hyeok9855 hyeok9855 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only env.py and gym, and left some comments. Thanks you!

"""
rewards = self.reward(states)
threshold = self._mode_reward_threshold()
return rewards >= threshold
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to subtract epsilon > 0, i.e., rewards >= threshold - EPS, to avoid floating point error. It happens sometimes (but I have no idea exactly when it does).

Comment on lines +304 to +315
# Determine side relative to the center along each dimension
pos = (
states.tensor.to(dtype=torch.get_default_dtype()) / (self.height - 1) > 0.5
).long()
weights = (
2 ** torch.arange(self.ndim - 1, -1, -1, device=states.tensor.device)
).long()
ids = (pos * weights).sum(dim=-1)
# Assign -1 to non-mode states
ids = torch.where(mask, ids, torch.full_like(ids, -1))

return ids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's happening here. I guess this might only be valid for specific types of GridReward.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need mode_ids or modes_found?

I guess the number of modes found can be obtained by len(modes_visited) where modes_visited is a set of states with rewards higher than a threshold.

buf /= N - G
t.data.copy_(buf)

return True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove.

if init_mode is not None
else getattr(args, "restart_init_mode", "random")
)
if mode == "mean_others":
Copy link
Collaborator Author

@josephdviviano josephdviviano Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be replaced with Omar's work - merged here #401

if getattr(args, "use_restarts", False) and iteration >= 1000:
agent_gid = distributed_context.agent_group_id or 0
if _group_random_coin(1.0 / 1000.0, agent_gid, iteration, args.seed):
prev_eps = float(getattr(args, "agent_epsilon", 0.0))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@younik you removed the strategy sampling logic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now it is sampled only at the beginning. I am gonna sample it again at every model build

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK great - it's not added yet I suppose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need that for this PR. As it stands, the work has simply been deleted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!!


# 2. Build the initial gflownet via the same pathway as restarts (unified behavior).
# We explicitly pick random parameter init for the initial build (no averaging).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put back the strategy sampling logic.

parameters from other ranks (keeping noisy sigmas/defaults intact), and
returns the new model and optimizer.
"""
args.agent_epsilon = float(strategy.get("epsilon", 0.0))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put back the strategy sampling logic.

return params


def _canonical_param_tensors_for_gflownet(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure your changes take into account the types of parameters used for NoisyLayers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't model.parameters() enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say you have a community of 5 agents.

3 of them are MLPs only, and 2 of them have noisy layers (which have different parameters, as shown here).

How do you do selective averaging of their parameters?

We'll probably have to just randomly initialize the elements of the noisy layers if there's nothing to average over in the community, but I think we'll need some logic to handle it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see.
Tbh, I don't think it is a good idea to average parameters from different type of layers. It is already not sure if averaging weights from same type of policy works.

I think we'll need some logic to handle it.

I suggest to implement this when we will need, and in a separate PR. This PR already does multiple stuffs other than refactoring

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok good idea.

if you look at how the layers are implemented, you would simply average over the weight that are shared between the noisy / non-noisy policies and randomly initialize the rest and/or average over the noisylayer specific parameters which are present in the subcommunity you are averaging over. but it's not that important (it's just another source of randomness).

return strat


def _reset_module_parameters_inplace(root: torch.nn.Module) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still randomize the parameters of a policy instead of averaging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this should happen? We can easily add it as alternative to average in the SpawnPolicy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would happen when spawning a policy. We will need to be able to configure it to happen under a few different conditions (we need to be able to do ablations and compare elements of our approach)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have added a new argument to enable it.

@josephdviviano
Copy link
Collaborator Author

josephdviviano commented Oct 9, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants