Skip to content

Conversation

taco-paco
Copy link
Contributor

@taco-paco taco-paco commented Jul 30, 2025

Everything starts when a scheduled Intent arrives...

Architecture

Schedulers

We can't directly spawn bunch of IntentExecutors. The reason is that one message can block execution of another message. To handle this the messages have to go through Scheduling.

Details: Once message make it to CommittorProcessor::schedule_base_intents it outsources intent to tokio task IntentExecutionEngine which figures out a scheduling.

IntentExecutionEngine

Accepts new messages, schedules them, and spawns up to 50 parallel IntentExecutors for each Intent. Once a particular IntentExecutor finishes execution we broadcast result to subscribers, like: RemoteScheduledCommitsProcessor or ExternalAccountsManager

details: For scheduling logic see IntentScheduler. Number of parallel IntentExecutor is controller by Semaphore.

IntentExecutor

IntentExecutor - responsible for execution of Intent. Calls TransactionPreparator and then executes a transaction returning as result necessary signatures

TransactionPreparator

TransactionPreparator - is an entity that handles all of the above "Transaction preparation" calling TaskBuilderV1, TaskStrategist, DeliveryPreparator and then assempling it all and passing to MessageExecutor

DeliveryPreparator

After our L1Tasks are ready we need to prepare eveything for their successful execution. DeliveryPreparator - handles ALTs and commit buffers

TaskBuilder

First, lets build atomic tasks from scheduled message/intent.

High level: TaskBuilder responsible for creating L1Tasks(to be renamed...) from ScheduledL1Message(to be renamed...).
Details: To do that is requires additional information from DelegationMetadata, it is provided CommitIdFetcher

L1Task

High level: L1Task - is an atomic operation that is to be performed on the Base layer, like: Commit, Undelegate, Finalize, Action.

Details: There's to implementation of L1Tasks: ArgsTask, BufferTask. ArgsTask - gives instruction using args. BufferTask - gives instruction using buffer. BufferTask at the moment supports only commits

CommitIdFetcher

High level: for account to be accepted by dlp it needs to have incremental commit ids. CommitIdFetcher provides a user with the correct ids/nonces for set of committees

Details: CommitIdTracker - implementation of CommitIdFetcher, that caches and locally increments commit ids using LruCache

TaskStrategist

After our tasks were built with TaskBuilder, they need to be optimized to fit into transaction. That what TaskStrategist does.

Details: Initially TaskBuilder builds ArgsTasks, TaskStrategist if needed optimzes them to BufferTask.

Changes

  • Persister changed from reqid & bundle_id format to message_id. Meaning row created per message. A particular Row tracking lifespan of Intent
  • Persister will be passed along into Executors & Scheduler for them to update Intent statuses during execution
  • No notion of bundles anymore, we represent things by Intent id
  • AccountsManager doesn't use custom AccountCommitter for periodic commits of accounts but instead uses CommittorService
  • RemoteScheduledCommitsProcessor extracted from AccountsManager since has nothing to do with it

Benchmarks

Using the same config as in this PR
Here's best result on master I got on my machine:

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 3401   | 1499 | 23918 | 3543 | 5079      | 1244   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 3460   | 1520 | 13509 | 3609 | 5430      | 1123   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Requests Per Second (TPS)     | ---          | ---    | ---  | ---   | ---  | ---       | ---    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| getX Request Response (TPS)   | ---          | ---    | ---  | ---   | ---  | ---       | ---    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

Here's the best result on feat/base-layer-ix/main which is current PR on review

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 3241   | 1450 | 17898 | 3397 | 4907      | 954    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 3332   | 1504 | 17947 | 3517 | 5187      | 988    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Requests Per Second (TPS)     | ---          | ---    | ---  | ---   | ---  | ---       | ---    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| getX Request Response (TPS)   | ---          | ---    | ---  | ---   | ---  | ---       | ---    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

So results are almost the same, but current feat/base-layer-ix/main sends 2 txs for commit & finalize, while master sends 1.
Here's results on optimized branch feat/base-layer-ix/single-tx-intent which sends just 1 tx as well. We get ~13% increase in performance on our trait fuel

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2896   | 1360 | 14462 | 3050 | 4619      | 1060   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2977   | 1376 | 15452 | 3149 | 4916      | 1149   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Requests Per Second (TPS)     | ---          | ---    | ---  | ---   | ---  | ---       | ---    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| getX Request Response (TPS)   | ---          | ---    | ---  | ---   | ---  | ---       | ---    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

thlorenz added 30 commits May 7, 2025 11:10
- at this point schedule commit tests pass with maximum concurrency
@thlorenz
Copy link
Contributor

Created some mermaid diagrams to understand the scheduling flow better. Could you please confirm that this is correct @taco-paco ?
Or you could edit this comment and/or we should probably add a readme and include the corrected versions there.

Scheduler data structures and scheduling decision

The scheduler enforces mutual exclusion per pubkey using two structures:

  • blocked_keys: HashMap<Pubkey, VecDeque> per-key FIFO queue of intent IDs
  • blocked_intents: HashMap<IntentID, IntentMeta{ num_keys, intent }>
flowchart LR
  subgraph DataStructures[Scheduler Data Structures]
    direction TB
    BK[blocked_keys: HashMap<Pubkey, VecDeque<IntentID>>]
    BI[blocked_intents: HashMap<IntentID, IntentMeta>]
    IM["IntentMeta { num_keys, intent }"]
  end

  A[Incoming ScheduledBaseIntentWrapper] --> B{"Get committed pubkeys\nintent.inner.get_committed_pubkeys()"}
  B -- "None (BaseActions)" --> C[Return intent to execute immediately]
  B -- "Some(pubkeys)" --> D{Any pubkey already\npresent in blocked_keys?}
  D -- No --> E["Insert intent_id into each pubkey queue\n(blocked_keys[pk].push_back(id))"] --> F[Return intent to execute immediately]
  D -- Yes --> G["Insert intent_id into each pubkey queue\n(blocked_keys[pk].push_back(id))"]
  G --> H["blocked_intents.insert(id, IntentMeta { num_keys, intent })"]
  H --> I["Return None (blocked)"]

  style C fill:#d2f8d2,stroke:#2e8b57
  style F fill:#d2f8d2,stroke:#2e8b57
  style I fill:#ffe0e0,stroke:#b22222
Loading

Notes grounded in code:

  • schedule() always pushes the intent_id to each relevant blocked_keys queue (lines ~101–106).
  • If no key was previously in blocked_keys, it returns Some(intent) for immediate execution; otherwise the intent is stored in blocked_intents and returns None (lines ~108–120).

Engine loop, concurrency, and prioritization

  • Up to 50 IntentExecutor tasks run in parallel (Semaphore of size MAX_EXECUTORS, currently 50).
  • On each loop iteration:
    • It first checks completed executors and immediately tries to pop_next_scheduled_intent() to prefer unblocked intents.
    • Otherwise it pulls new intents from the mpsc channel/DB, and schedule() decides if they execute now or get blocked.
  • On executor completion: broadcast result, scheduler.complete() to unblock queues, free semaphore.
sequenceDiagram
  autonumber
  participant External as CommittorProcessor/DB/Channel
  participant Engine as IntentExecutionEngine
  participant Sched as IntentScheduler
  participant Exec as IntentExecutor (N up to 50)

  loop Main Loop
    Engine->>Sched: pop_next_scheduled_intent()
    alt Got executable intent
      Engine->>Engine: acquire semaphore permit (<= 50)
      Engine->>Exec: spawn execute(intent)
    else None (all blocked)
      Engine->>External: get_new_intent() or wait
      External-->>Engine: intent (from channel or DB)
      Engine->>Sched: schedule(intent)
      alt schedule returns Some(intent)
        Engine->>Engine: acquire permit
        Engine->>Exec: spawn execute(intent)
      else returns None (blocked)
        Note over Sched: Intent enqueued in blocked_intents\nand queued in blocked_keys
      end
    end
  end

  par Executor completes
    Exec-->>Engine: result (success/failure)
    Engine->>Sched: complete(intent.inner)
    Engine->>Engine: drop semaphore permit
    Engine-->>Subscribers: broadcast result
  and Engine next tick
    Engine->>Sched: pop_next_scheduled_intent()
    Sched-->>Engine: Next unblocked intent if any
  end
Loading

Unblocking mechanics

  • complete() pops the executed intent_id from the front of each of its key queues; removes a key’s queue entirely if it becomes empty (lines ~135–155).
  • pop_next_scheduled_intent(): builds a map of candidates by looking at the front element of every key queue, counts how many queues each id leads, and returns the first blocked intent whose count equals its num_keys (lines ~162–204).
flowchart TD
  subgraph Completion[On Intent Completion]
    direction TB
    X["complete(base_intent)"]
    Y{For each pubkey in intent}
    X --> Y
    Y --> Z["blocked_keys[pubkey].pop_front()\nassert popped == intent_id"]
    Z --> W{Queue empty now?}
    W -- Yes --> R[Remove pubkey entry from blocked_keys]
    W -- No --> S[Leave remaining intent_ids queued]
  end

  subgraph Unblocking[Who becomes executable next?]
    direction TB
    A2["pop_next_scheduled_intent()"]
    B2["Build execute_candidates:\nfor each pubkey queue, take queue.front()\ncount occurrences per intent_id"]
    C2{"Find id in blocked_intents\nwhere count == IntentMeta.num_keys"}
    A2 --> B2 --> C2
    C2 -- Found --> D2[Remove from blocked_intents and return intent]
    C2 -- None --> E2[Return None]
  end
Loading

Scenario A: Overlapping pubkeys enforce strict order

From code comments (intent_scheduler.rs lines ~47–57). Suppose:

  • t1 executing: [a1, a2, a3] and [b1, b2, b3] (two sets of keys it holds)
  • t2 waiting: [a1, b1]
  • t3 arriving: [a1, a3]

t3 cannot bypass t2 because both share a1, and per-key queues keep FIFO order.

flowchart LR
  subgraph a-keys
    direction TB
    Aqa[a1 queue] -->|front| A1[t1] --> A2[t2] --> A3[t3]
    A2a[a2 queue] -->|front| A1a[t1]
    A3a[a3 queue] -->|front| A1b[t1] --> A3b[t3]
  end

  subgraph b-keys
    direction TB
    B1[b1 queue] -->|front| B1t1[t1] --> B1t2[t2]
    B2[b2 queue] -->|front| B2t1[t1]
    B3[b3 queue] -->|front| B3t1[t1]
  end

  style A1 fill:#cfe8ff,stroke:#1e90ff
  style B1t1 fill:#cfe8ff,stroke:#1e90ff
Loading

Completion and unblocking order:

  • After t1 completes: fronts become a1->t2, a3->t3, b1->t2
  • pop_next_scheduled_intent() finds t2 ready (it leads a1 and b1) → execute t2
  • After t2 completes: a1->t3; t3 leads a1 and a3 → execute t3
sequenceDiagram
  autonumber
  participant S as Scheduler
  participant T1 as t1
  participant T2 as t2
  participant T3 as t3

  Note over S: Initial fronts: a1->t1, a2->t1, a3->t1, b1->t1, b2->t1, b3->t1
  T1-->>S: complete()
  Note over S: Fronts after t1: a1->t2, a3->t3, b1->t2
  S-->>S: pop_next_scheduled_intent() => t2 (num_keys=2, both fronts)
  T2-->>S: complete()
  Note over S: Fronts after t2: a1->t3, a3->t3
  S-->>S: pop_next_scheduled_intent() => t3
Loading

Scenario B: Indirect blocking through a shared key

From code comments (intent_scheduler.rs lines ~58–64). Suppose:

  • t1 executing: [a1, a2, a3]
  • t2 blocked: [c1, a1]
  • t3 arriving: [c2, c1]

Even though t3 doesn’t overlap with t1 directly, it shares c1 with t2 and must wait until t2 executes, which itself must wait for a1 to be freed by t1.

flowchart LR
  subgraph a-keys
    direction TB
    Qa[a1 queue] -->|front| Qa1[t1] --> Qa2[t2]
    Qb[a2 queue] -->|front| Qb1[t1]
    Qc[a3 queue] -->|front| Qc1[t1]
  end

  subgraph c-keys
    direction TB
    Qc1q[c1 queue] -->|front| Qc1t2[t2] --> Qc1t3[t3]
    Qc2q[c2 queue] -->|front| Qc2t3[t3]
  end

  style Qa1 fill:#cfe8ff,stroke:#1e90ff
Loading

Unblocking sequence:

  • t1 completes → a1 front becomes t2; t2 now leads both c1 and a1 → execute t2
  • t2 completes → c1 front becomes t3; c2 already led by t3 → execute t3

Engine-level concurrency and fairness

  • Global concurrency cap: semaphore size 50.
  • Fairness: per-pubkey FIFO queues + requirement to be front-of-queue on all keys before execution.
  • Prioritization: after any executor completion, engine first tries to pop an unblocked intent before receiving new ones (tokio::select! biased branch).
flowchart LR
  subgraph Engine
    direction TB
    Rcv[Receive or fetch intent]
    Sch["Scheduler.schedule()"]
    Gate{Semaphore\npermits <= 50}
    Spawn[Spawn IntentExecutor]
    Done["Executor finished\nbroadcast + Scheduler.complete()"]
  end

  Rcv --> Sch
  Sch -->|"Some (non-conflicting)"| Gate
  Sch -->|"None (blocked)"| Rcv
  Gate -->|acquire| Spawn --> Done --> Rcv

  style Gate fill:#f7f7d7,stroke:#b59b00
Loading

Copy link
Contributor

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass 2 of the review. Mainly focused on finishing the /persist folder.

Stopping here to allow us to resolve issues from previous pass as well before adding more.

Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first iteration of the review, I'm not entirely sure whether I should continue, as it's really hard to get a wholistic idea or to reason about correctness/efficiency of the high level logic, from reviewing such a huge PR.

Most of the comments are nitpicking or code style complaints really.

…l + introduced cancellation logic to some of the services
fn from(value: AccountCommittee) -> Self {
CommittedAccountV2 {
pubkey: value.pubkey,
account: Account {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those owners different?

Copy link
Contributor

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass 2

I reviewed mainly the ./tasks and also discovered that the committor service integration tests have been changed to where they don't guarantee a lot of things they did before, mainly commit strategy. This is very concerning (see more in the comment there).

I also ran a script to get some data on how many unsafe calls were added without // SAFETY comments (outside of tests):

  • panic!: +3, -4, net -1
  • expect(: +52, -18, net 34
  • unwrap(: +19, -13, net 6
  • Total: +74, -35, net 39

That's a toal of 39 more places where we need to wonder if the validator won't crash.
We need to change or at least annotate all of those to guarantee stability going forward.

let new_ix =
tasks[index].instruction(&Pubkey::new_unique());
let new_ix_size = bincode::serialized_size(&new_ix)
.expect("instruction serialization")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that most likely this is safe, but please investigate in which cases the serializer would actually fail and then add a // SAFETY that adds that info along with why we won't run into that.
That way not everyone reading this has to stumble and wonder if this is actually safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the logic here to avoid expects

})
}

fn compute_units(&self) -> u32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again compute unit config strewn about different places instead of consolidated in one module like it was before.
The method here can expose this, but it should be configured all together in one place which makes it easier to analyze/tweak these values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't follow this at all. compute_units is in one file, wdum? compute_units are as configurable as before - hardcoded values. Please, elaborate

Here's old config for your reference, those aren't modified anywhere in your code.

            args_process: Budget {
                compute_unit_price,
                base_budget: 80_000,
                per_committee: 35_000,
            },
            buffer_close: Budget {
                compute_unit_price,
                base_budget: 10_000,
                per_committee: 25_000,
            },
            buffer_process_and_close: Budget {
                compute_unit_price,
                base_budget: 40_000,
                per_committee: 45_000,
            },

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an actual number (55_000) somewhere in the middle of the BufferTask.
What I would prefer is to have those numbers live in a central place only concerned with compute units so we can tweak them there and even make them configurable in the future.

If you have new ones please add them to a single module like I did before and reference them from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any change to fix this.

VersionedMessage::V0(message),
&[authority],
)
.expect("Signing transaction has to be non-failing");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are already returning a Result from this function. Even if you think it's not failing the expect call looks risky, much easier to just return an Error instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If validator authority can't sign transactions, what's the point of validator? this would fail on start. I can add SAFETY comment

Copy link
Contributor

@thlorenz thlorenz Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I just had to do the following:

  1. follow the code of the try_new call
  2. FInd the crate and from that the repo
  3. find the relevant project in that repo
  4. find the code in that repo

Just so I can be sure + communicate that there are many more ways this could fail.

Just have a look at the relevant code

Our code here has no SAFETY added, just says // SignerError is critical and that's not even the only error that could occur.

Why can we not just bubble an exception up instead of me having to do this everytime I see this expect?
Otherwise you literally have to put a SAFETY that explains why none of the exception cases can occur. However if the try_new implementation then changes that won't even be complete.

Even if it's clear to you please be respective of your fellow devs time (in this case mine which is more time spent most likely at this point thant it would have been to just bubble the error) + every dev that runs across this line in the future and has to double check that it's not a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're very right! For some reason I though it was pure signing related issue. Added SignerError handling

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the reason why it is much more robust to return a Result instead of guessing. Anyone looking at this in the future will not be sure either.

taco-paco and others added 12 commits August 15, 2025 14:49
refactor: remove MESSAGE_ID atomic and moved id generation into MagicContext. Addressed comments
feat: introduced proper filtering of duplicate intents in IntentScheduler::schedule
# Conflicts:
#	Cargo.lock
#	magicblock-api/src/magic_validator.rs
#	magicblock-validator-admin/src/claim_fees.rs
#	test-integration/Cargo.lock
#	test-integration/Cargo.toml
#	test-integration/test-ledger-restore/src/lib.rs
#	test-integration/test-ledger-restore/tests/05_program_deploy.rs
#	test-integration/test-ledger-restore/tests/12_two_airdrops_one_after_account_flush.rs
#	test-integration/test-ledger-restore/tests/13_timestamps_match_during_replay.rs
#	test-integration/test-ledger-restore/tests/14_restore_with_new_keypair.rs
#	test-integration/test-ledger-restore/tests/15_skip_replay.rs
#	test-integration/test-magicblock-api/Cargo.toml
#	test-integration/test-runner/bin/run_tests.rs
feat: executing Commit & Finalize stages in one tx.

Some code restructuring as well, IntentExecutor now has control over
tasks directly, which will be useful for retry logic.

<!-- greptile_comment -->

## Greptile Summary

This PR implements a significant performance optimization that combines
the Commit and Finalize stages into a single transaction when possible,
reducing on-chain transaction costs and improving efficiency. The change
introduces an `ExecutionOutput` enum with `SingleStage` and `TwoStage`
variants to support both the optimized single-transaction execution and
maintain backward compatibility with the traditional two-stage approach.

The architectural refactoring gives the `IntentExecutor` direct control
over task creation and execution strategies. A new unified transaction
preparation system replaces the previous separate commit/finalize
methods, allowing the executor to attempt combining operations when they
fit within transaction limits (hardcoded at 22 tasks to avoid CPI
limits). The system falls back to two-stage execution when the single
transaction would be too large.

Key components were renamed and expanded: `CommitIdFetcher` became
`TaskInfoFetcher` with additional capabilities to fetch rent
reimbursements, and `TransactionPreparatorV1` was simplified to work
with unified `TransactionStrategy` objects. The changes also include
updates to test infrastructure, integration tests, and configuration
files to support the new execution model. The `dyn-clone` dependency was
added to enable cloning of trait objects, which is necessary for the
optimization logic that tests different execution strategies.

This optimization is designed as a temporary solution until challenge
windows are implemented, which will require proper two-stage protocol
execution.

## Confidence score: 3/5

- This PR introduces significant architectural changes that could affect
transaction execution reliability
- Score lowered due to hardcoded limits (22 tasks) and complex fallback
logic that may not handle all edge cases
- Pay close attention to `intent_executor.rs`,
`transaction_preparator.rs`, and test files for potential transaction
size or CPI depth issues
<!-- /greptile_comment -->
feat: added redelegation logic, which isn't supported for now :(
Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say that I'm happy with this PR, I'm still of opinion that the code that is easiest to maintain is the code that was never written, which this PR clearly doesn't share. The trait explosion seems to be the biggest issue that I have worries about, but I guess time will show whether that was a right choice. If maintenance becomes an issue in the future, we might need to consider a complete overhaul of the entire subsystem. For now I have to trust the author that it works.

@taco-paco taco-paco changed the base branch from dev to master August 26, 2025 04:41
# Conflicts:
#	magicblock-account-cloner/src/remote_account_cloner_worker.rs
#	magicblock-committor-service/src/bundle_strategy.rs
#	test-integration/programs/flexi-counter/src/processor.rs
#	test-integration/programs/schedulecommit/src/api.rs
#	test-integration/schedulecommit/test-scenarios/tests/01_commits.rs
#	test-integration/schedulecommit/test-scenarios/tests/02_commit_and_undelegate.rs
#	test-integration/schedulecommit/test-scenarios/tests/03_commits_fee_payer.rs
#	test-integration/schedulecommit/test-security/tests/01_invocations.rs
#	test-integration/test-committor-service/tests/test_ix_commit_local.rs
#	test-integration/test-committor-service/tests/utils/instructions.rs
#	test-integration/test-config/src/lib.rs
#	test-integration/test-config/tests/auto_airdrop_feepayer.rs
#	test-integration/test-magicblock-api/tests/test_claim_fees.rs
#	test-integration/test-tools/src/integration_test_context.rs
Copy link
Contributor

@thlorenz thlorenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this review I focused on pieces that can be cleaned up without too much effort.

They are divided into:

  • critical for security and stability
  • performance related
  • patterns established in the rest of the repo

The first two are most important, but please address all of them.

As a note I'm unable in a reasonable amount of time to verify correctness of everything as this is basically a total rewrite of 13k lines of code.

Thus I have to trust that at least the program instructions still enforce the same security as before which is why it is essential to add the tests back that ensured that.

For the rest of this service I recommend testing this in a staging environment first in order to verify that nothing easily detectable is broken.

let new_ix_size =
bincode::serialized_size(&new_ix).unwrap_or(u64::MAX);
let new_ix_size =
usize::try_from(new_ix_size).unwrap_or(usize::MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this will most likely never happen, but it's not a good solution to just return some value without any indication that something already went wrong.

Also you are already returning a Result from this method, why not just add an error variant and return a proper error?
The current implementation is super confusing.

Even if this case never occurs, returning a proper result here is way easier to reason about when I read this code + is technically more correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also now noticing this is pretty much duplicated code from line 163.
Please make a helper function and fix the issues I pointed out in one place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be realistic, we will never write a code producing or allowing 4Gb instructions, so even if we run the commitor service on 32 bit machine (which I very much doubt) using as usize cast is legit, instead of cumbersome try_from conversion. As for the bincode error, I have to concur with @thlorenz that if we already return an error, might as well propagate this one.

}

async fn create_transactions_to_commit_specific_accounts(
fn process_l1_messages_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce L1/L2 terminology. We are not a layer 2.
The terminology we already use everywhere is either of the below for the Solana network:

  • chain
  • remote

results: Vec<BroadcastedIntentExecutionResult>,
scheduled_l1_messages: &[ScheduledBaseIntentWrapper],
) -> Vec<ExecutionOutput> {
// Filter failed l1 messages, log failed ones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto l1

.filter_map(|execution_result| match execution_result {
Ok(value) => Some(value),
Err(err) => {
error!("Failed to send l1 message: {}", err.2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto l1

) -> AccountsResult<Vec<PendingCommitTransaction>> {
let pubkeys_with_hashes = payloads
// For successfully committed accounts get their (pubkey, hash)
let pubkeys_with_hashes = scheduled_l1_messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto l1 (won't keep pointing them out)
Please just grep through the code and ensure that no l1 mention remains please.

magicblock-rpc-client = { workspace = true }
program-flexi-counter = { workspace = true, features = ["no-entrypoint"] }

async-trait = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto crates are alphabetically ordered

@@ -21,3 +18,5 @@ solana-rpc-client-api = { workspace = true }
solana-sdk = { workspace = true }
solana-transaction-status = { workspace = true }
tokio = { workspace = true }
lazy_static = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use of reordering these (no longer alphabetical) + how is it related to this PR?

@@ -25,11 +26,14 @@ version = "0.0.0"
edition = "2021"

[workspace.dependencies]
test-ledger-restore = { path = "./test-ledger-restore" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these also were alphabetical, please keep it that way.

It makes it easier to quickly check if a crate that's needed is already a dependency or not.

@@ -124,6 +124,11 @@ setup-config-devnet:
SETUP_ONLY=devnet \
$(MAKE) test

test-schedule-intents:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, please also add the relevant setup tasks:

setup-schedule_intents-devnet:
	RUN_TESTS=schedule_intents \
	SETUP_ONLY=devnet \
	$(MAKE) test
setup-schedule_intents-both:
	RUN_TESTS=schedule_intents \
	SETUP_ONLY=both \
	$(MAKE) test

@@ -195,6 +196,7 @@ spl-token-2022 = "=6.0"
static_assertions = "1.1.0"
strum = "0.24"
strum_macros = "0.24"
lru = "0.16.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto alphabetical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants