Skip to content

firedancer-dev: repair profiler #5687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

jvarela-jump
Copy link
Contributor

Adding firedancer-dev command repair. This is a standalone application.
The fd_shredcap_tile has been modified to publish_stake_weights from the snapshot manifest supplied by the snapshot tile(s).
This PR also includes an upgrade to the replay tile's generate_stake_weight_msg.

@@ -318,7 +318,7 @@ publish_stake_weights( fd_replay_tile_ctx_t * ctx,
if( epoch_stakes_root!=NULL ) {
ulong * stake_weights_msg = fd_chunk_to_laddr( ctx->stake_out->mem, ctx->stake_out->chunk );
ulong epoch = fd_slot_to_leader_schedule_epoch( epoch_schedule, fd_bank_slot_get( slot_ctx->bank ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on fd_bank.h and comments in fd_runtime.c

Assuming we are executing in epoch E
next_epoch_stakes (E - 1)
epoch_stakes (E - 2)
stakes->vote_accounts (E)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look right to me.

To compute leader schedule in epoch E, we need the list of (identity, stake) at E-1 -- and soon (vote, stake, identity). And similarly for epoch E-1 we need the data at E-2.
We do store next_epoch_stakes(E-1) and epoch_stakes(E-2).

But using vote_accounts(E) is incorrect. We need to use the correct vote accounts at epoch boundaries, that are part of the manifest/bank.epoch_stakes (or bank.versioned_epoch_stakes -- I've been told these are the same but I haven't checked yet. Agave seems to be switching to versioned in... 2.4? not sure about this either).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use epoch_stakes (E-2), which is what I think stake_weights is supposed to represent in this message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can just rename this stuff to be the same and add copious documentation in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some documentation at the top of fd_runtime_process_new_epoch:

/* Starting a new epoch.
  New epoch:        T
  Just ended epoch: T-1
  Epoch before:     T-2

  In this function:
  - stakes in T-2 (epoch_stakes) should be replaced by T-1 (next_epoch_stakes)
  - stakes at T-1 (next_epoch_stakes) should be replaced by updated stakes at T (stakes->vote_accounts)
  - leader schedule should be calculated using new T-2 stakes (epoch_stakes)

  Invariant during an epoch T:
  next_epoch_stakes holds the stakes at T-1
  epoch_stakes holds the stakes at T-2
 */

Agree we should make this clearer

Copy link
Contributor

@topointon-jump topointon-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stake weight changes look good to me

@jvarela-jump jvarela-jump force-pushed the jvarela/repair_profiler branch 5 times, most recently from 4d17463 to 93a3c72 Compare July 22, 2025 16:09
lidatong
lidatong previously approved these changes Jul 23, 2025
Copy link
Member

@lidatong lidatong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! this will be super useful for us

Comment on lines 60 to 69
struct fd_stake_out_link {
ulong idx;
fd_frag_meta_t * mcache;
ulong * sync;
ulong depth;
ulong seq;
fd_wksp_t * mem;
ulong chunk0;
ulong wmark;
ulong chunk;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align

Comment on lines +144 to +161
FD_FN_CONST static inline ulong
manifest_bank_align( void ) {
return 128UL;
}

FD_FN_CONST static inline ulong
manifest_bank_footprint( void ) {
return 2UL*1024UL*1024UL*1024UL;
}

FD_FN_CONST static inline ulong
manifest_spad_align( void ) {
return 128UL;
}

FD_FN_CONST static inline ulong
manifest_spad_footprint( void ) {
return 1UL*1024UL*1024UL*1024UL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these magic values seem a little strange. are they just copied over from the snapshot tile?

Comment on lines 264 to 266
if( FD_LIKELY( ctx->in_kind[ in_idx ]==SNAP_IN ) ) {
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt seem needed given return 0 below?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants