feat: drop leaderless party #52

swift1337 · 2025-03-19T19:31:25Z

Closes #47

Summary by CodeRabbit

New Features
- Introduced a mechanism to generate unique identifiers for key operations, enhancing request tracking.
- Added a new method for generating message IDs based on request fields.
- Introduced a new constant for latency tracking in logs.
Bug Fixes
- Improved and standardized timeout error messages for more consistent feedback during key processes.
Refactor
- Streamlined the workflows for joining and signing operations by removing legacy protocols and redundant logic.
- Simplified error handling and logging across various components for better clarity and consistency.
- Updated method signatures and parameter names for improved consistency.
Chores
- Updated timeout configurations and logging details to boost system responsiveness and reliability.

coderabbitai · 2025-03-19T19:31:33Z

📝 Walkthrough

Walkthrough

This pull request updates error handling by renaming the timeout error variable from ErrTssTimeOut to ErrTimeoutTSS across multiple modules. It also adds new methods to compute request IDs, adjusts naming conventions and logging messages, and removes deprecated or leaderless join party logic. Several test files have been updated with new function names, shortened timeouts, and revised test cases. Additionally, obsolete protocol buffer messages and unused constants have been removed to simplify the codebase.

Changes

File(s)	Change Summary
`blame/policy.go`, `blame/types.go`, `keygen/ecdsa/tss_keygen.go`, `keygen/eddsa/tss_keygen.go`, `keysign/ecdsa/tss_keysign.go`, `keysign/eddsa/tss_keysign.go`	Renamed error variable from `ErrTssTimeOut` to `ErrTimeoutTSS` to standardize timeout error handling.
`keygen/request.go`, `keysign/request.go`	Added new `MsgID` method to compute a hash string based on request content.
`logs/fields.go`	Added a new constant `Latency` with the value `"latency"`.
`messages/join_party.proto`, `messages/version.go`	Removed the `JoinPartyRequest` message; replaced `NEWJOINPARTYVERSION` with a more descriptive `VersionJoinPartyWithLeader`.
`monitor/metric.go`	Updated parameter naming from `joinpartyTime` to `joinPartyTime` and simplified the control flow in join party metrics.
`p2p/communication.go`	Removed the `joinPartyProtocol` variable; updated error handling, logging messages, and streamlined connectivity check logic.
`p2p/party_coordinator.go`	Removed methods: `HandleStream`, `sendRequestToAll`, and `JoinPartyWithRetry`, reflecting a shift away from leaderless join party handling.
`p2p/party_coordinator_test.go`	Renamed `setupHostsLocally` to `setupHosts`, added tests (`leaderAppearsLastTest`, `leaderAppersFirstTest`, `TestGetPeerIDs`), and updated timeouts.
`p2p/party_coordinator_test_with_leader_test.go`	Deleted file containing obsolete leader test cases.
`p2p/peer_status.go`	Removed the `NoLeader` constant and associated conditional logic.
`tss/keygen.go`, `tss/keysign.go`, `tss/tss.go`, `tss/tss_4nodes_test.go`, `tss/tss_4nodes_zeta_test.go`	Enforced version checks against `VersionJoinPartyWithLeader`, removed outdated leaderless join party logic, improved error wrapping, updated logging, renamed helper functions, and adjusted test timeouts/naming for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TSS_Server
    participant JoinParty_Process
    Client->>TSS_Server: Send Keygen Request
    TSS_Server-->>TSS_Server: Validate request version
    TSS_Server-->>TSS_Server: Compute MsgID from request
    TSS_Server->>JoinParty_Process: Initiate join party process
    JoinParty_Process-->>TSS_Server: Return blame nodes or error
    TSS_Server->>Client: Return Keygen Response

sequenceDiagram
    participant Client
    participant TSS_Server
    participant KeySign_Process
    Client->>TSS_Server: Send KeySign Request
    TSS_Server-->>TSS_Server: Enforce version check
    TSS_Server-->>TSS_Server: Compute MsgID from request
    TSS_Server->>KeySign_Process: Process signing operation
    KeySign_Process-->>TSS_Server: Return signature or error
    TSS_Server->>Client: Return KeySign Response

Assessment against linked issues

Objective	Addressed	Explanation
Drop leaderless signing (#47)	✅

Possibly related PRs

Remove all dynamic peer discovery #40: Renaming of error variables across multiple files aligns with similar modifications in this PR.
feat: whitelist connection gater #31: Standardizes error handling in blame retrieval methods, paralleling the changes made here.
Add concurrency and scale testing #22: Implements comparable renaming from ErrTssTimeOut to ErrTimeoutTSS across key generation and signing modules.

Suggested reviewers

gartnera
brewmaster012

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

tss/keygen.go

tss/keysign.go

tss/tss_4nodes_test.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

p2p/communication.go (2)

21-23: Consider consolidating error handling libraries.
Switching to github.com/pkg/errors and introducing errgroup is fine. If your project targets Go ≥ 1.13, you can also rely on native error wrapping (fmt.Errorf(..., %w)).

222-225: Reassess log level for skipped connectivity checks.
Logging an error and returning nil can be confusing. Consider using a Warn level to clarify that no connectivity check is performed when no bootstrap node is configured.

p2p/party_coordinator_test.go (2)

46-74: Potential flakiness in leaderAppearsLastTest.
Random sleeps can help concurrency tests but may yield non-deterministic outcomes. Consider using synchronization or deterministic intervals for reliability.

76-102: Similar concurrency approach in leaderAppersFirstTest.
Again, watch for test flakiness introduced by random sleeps. Overall logic is sound.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ed8657 and 735dbc5.

⛔ Files ignored due to path filters (1)

messages/join_party.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (23)

blame/policy.go (3 hunks)
blame/types.go (1 hunks)
keygen/ecdsa/keygen_test.go (1 hunks)
keygen/ecdsa/tss_keygen.go (1 hunks)
keygen/eddsa/tss_keygen.go (1 hunks)
keygen/request.go (3 hunks)
keysign/ecdsa/tss_keysign.go (1 hunks)
keysign/eddsa/tss_keysign.go (1 hunks)
keysign/request.go (2 hunks)
logs/fields.go (1 hunks)
messages/join_party.proto (0 hunks)
messages/version.go (1 hunks)
monitor/metric.go (1 hunks)
p2p/communication.go (6 hunks)
p2p/party_coordinator.go (3 hunks)
p2p/party_coordinator_test.go (4 hunks)
p2p/party_coordinator_test_with_leader_test.go (0 hunks)
p2p/peer_status.go (1 hunks)
tss/keygen.go (8 hunks)
tss/keysign.go (7 hunks)
tss/tss.go (1 hunks)
tss/tss_4nodes_test.go (9 hunks)
tss/tss_4nodes_zeta_test.go (4 hunks)

💤 Files with no reviewable changes (2)

messages/join_party.proto
p2p/party_coordinator_test_with_leader_test.go

🧰 Additional context used

🧠 Learnings (1)

tss/tss_4nodes_zeta_test.go (1)

Learnt from: gartnera
PR: zeta-chain/go-tss#22
File: tss/tss_4nodes_zeta_test.go:80-86
Timestamp: 2025-03-19T19:31:42.874Z
Learning: The `doTestKeySign` function in the `FourNodeScaleZetaSuite` already includes assertions to handle errors within the goroutines, ensuring proper error management.

🧬 Code Definitions (14)

blame/types.go (3)

keygen/eddsa/tss_keygen.go (1) (1)

New (37-61)

keysign/eddsa/tss_keysign.go (1) (1)

New (40-63)

tss/tss.go (1) (1)

New (70-142)

keysign/ecdsa/tss_keysign.go (1)

blame/types.go (1) (1)

ErrTimeoutTSS (22-22)

keysign/eddsa/tss_keysign.go (1)

blame/types.go (1) (1)

ErrTimeoutTSS (22-22)

keygen/ecdsa/keygen_test.go (3)

tss/tss_4nodes_test.go (7) (7)

s (75-112)

s (121-130)

s (149-226)

s (228-267)

s (269-334)

s (336-346)

s (348-388)

tss/tss_4nodes_zeta_test.go (8) (8)

s (43-86)

s (88-95)

s (101-107)

s (110-134)

s (157-180)

s (182-221)

s (223-230)

s (232-267)

keygen/eddsa/keygen_test.go (1) (1)

s (72-90)

blame/policy.go (1)

blame/types.go (1) (1)

ErrTimeoutTSS (22-22)

tss/keysign.go (3)

keysign/response.go (1) (1)

Response (17-21)

messages/version.go (1) (1)

VersionJoinPartyWithLeader (3-3)

keysign/keysign.go (1) (1)

TssKeySign (11-19)

tss/tss_4nodes_zeta_test.go (3)

tss/tss_4nodes_test.go (11) (11)

s (75-112)

s (121-130)

s (149-226)

s (228-267)

s (269-334)

s (336-346)

s (348-388)

getPreParams (390-403)

partyNum (33-33)

partyVersion (36-36)

wg (96-96)

tss/tss.go (1) (1)

Server (40-54)

common/types.go (1) (1)

TssConfig (7-18)

keygen/eddsa/tss_keygen.go (1)

blame/types.go (1) (1)

ErrTimeoutTSS (22-22)

tss/keygen.go (8)

keygen/request.go (1) (1)

Request (11-16)

keygen/response.go (1) (1)

Response (9-15)

messages/version.go (1) (1)

VersionJoinPartyWithLeader (3-3)

logs/fields.go (2) (2)

MsgID (8-8)

Latency (12-12)

blame/types.go (2) (2)

Blame (41-47)

Node (33-37)

conversion/key_provider.go (1) (1)

GetPubKeyFromPeerID (73-90)

blame/blame.go (1) (1)

NewBlame (27-33)

common/status.go (3) (3)

Status (3-3)

Fail (8-8)

Success (7-7)

p2p/communication.go (2)

p2p/peer_address.go (1) (1)

c (8-17)

tss/tss.go (2) (2)

New (70-142)

host (213-213)

keygen/ecdsa/tss_keygen.go (1)

blame/types.go (1) (1)

ErrTimeoutTSS (22-22)

tss/tss.go (5)

keygen/eddsa/tss_keygen.go (1) (1)

New (37-61)

keysign/eddsa/tss_keysign.go (1) (1)

New (40-63)

tss/keysign.go (6) (6)

t (30-46)

t (48-176)

t (178-184)

t (186-350)

t (352-356)

t (358-374)

tss/keygen.go (2) (2)

t (20-189)

t (191-365)

logs/fields.go (1) (1)

MsgID (8-8)

p2p/party_coordinator_test.go (3)

p2p/party_coordinator.go (20) (20)

PartyCoordinator (30-38)

wg (212-212)

wg (298-298)

NewPartyCoordinator (41-59)

ErrLeaderNotReady (24-24)

ErrJoinPartyTimeout (23-23)

pc (62-66)

pc (68-108)

pc (110-132)

pc (135-172)

pc (174-178)

pc (180-191)

pc (193-203)

pc (205-226)

pc (228-242)

pc (244-286)

pc (288-376)

pc (378-426)

pc (428-465)

pc (467-469)

p2p/communication.go (2) (2)

wg (229-229)

timeout (165-165)

p2p/leader_provider.go (1) (1)

LeaderNode (12-26)

tss/tss_4nodes_test.go (5)

messages/version.go (1) (1)

VersionJoinPartyWithLeader (3-3)

tss/tss.go (1) (1)

Server (40-54)

common/types.go (1) (1)

TssConfig (7-18)

tss/keysign.go (1) (1)

algo (200-200)

keysign/response.go (1) (1)

Response (17-21)

🔇 Additional comments (73)

blame/types.go (1)

22-22: Consistent error naming convention applied.

The renaming from ErrTssTimeOut to ErrTimeoutTSS along with the updated error message improves consistency with Go naming conventions. This standardization will help with error handling clarity across the codebase.

keygen/ecdsa/keygen_test.go (1)

98-98: Improved loop syntax.

Removed the unused blank identifier (_) from the range loop, which is a Go best practice. The compiler would suggest this change since we're only using the index variable.

p2p/peer_status.go (2)

62-62: Added linter directive for test-only function.

The addition of //nolint:unused // used in tests is appropriate as it explicitly documents that this function is retained for testing purposes despite being unused in the main implementation.

90-111: Simplified leader handling in updatePeer.

The removal of the NoLeader constant and associated conditional logic aligns with the PR objective of dropping the leaderless party protocol. This simplifies the control flow by focusing only on counting participants regardless of leader status.
tss/tss_4nodes_zeta_test.go (6)

56-56: Standardized function naming convention.

The rename from getPreparams to getPreParams maintains consistent camel case capitalization throughout the codebase.

85-85: Updated version parameter for consistency.

Changed newJoinPartyVersion to partyVersion, aligning with the simplified party protocol model where leaderless parties have been removed.

88-94: Standardized function and log message naming.

Updated "Keysign" to "KeySign" in function name and log messages, improving naming consistency throughout the codebase.

97-101: Consistent naming in comments and function name.

Updated method names and comments from "Keysign" to "KeySign" for consistency.

104-104: Consistent version parameter in test calls.

Updated to use partyVersion instead of newJoinPartyVersion, matching the changes made throughout the codebase.

60-60:
❓ Verification inconclusive

Reduced KeySignTimeout value.

Decreased timeout from 90 to 20 seconds, which could improve test performance if the signing operations typically complete well within this timeframe.

Verify that 20 seconds is sufficient for all key signing operations, especially under load or in environments with limited resources:

🏁 Script executed:
#!/bin/bash
# Description: Check if there are any timeout errors in tests or logs that might indicate the new timeout is too short

# Look for timeout errors in the codebase
rg -A 3 -B 3 "ErrTimeout(TSS|TLS)" --no-ignore
Length of output: 3291
Action: Verify that the reduced 20-second timeout is adequate under all conditions.

The change reduces the KeySignTimeout from 90 seconds to 20 seconds to improve test performance when signing operations complete quickly. Our static analysis—searching the codebase for occurrences of timeout errors (e.g., ErrTimeoutTSS)—did not reveal any unexpected issues attributable to the new timeout. However, since the grep output only confirms the normal placement of timeout error handling, please ensure that under heavy load or limited resource conditions all key signing operations complete within this timeframe.

Confirm via load/performance tests that key signing under stress does not consistently hit the new timeout.

Monitor logs during such tests for any unexpected timeout errors.
p2p/communication.go (8)

29-29: Confirm removal of old joinPartyProtocol.
You introduced joinPartyProtocolWithLeader. Ensure all references to the retired joinPartyProtocol are removed to avoid confusion.

258-264: Error on unavailable bootstrap nodes looks correct.
Returning an error if no bootstrap node responds ensures proper connectivity checks.

296-299: Resource limits for new protocols.
Adding both joinPartyProtocolWithLeader and TSSProtocolID to the resource manager ensures they won’t be throttled unfairly.

354-355: Connectivity check placed after attempted connections.
Invoking bootStrapConnectivityCheck() following connection attempts enforces robust verification of network reachability.

381-384: Appropriate concurrency model with errgroup and atomic flags.
Using errgroup.Group plus an atomic.Bool for tracking success is a clear approach to concurrent connection attempts.

388-407: Concurrent connection logic.
Running multiple connection attempts in parallel helps ensure partial failures do not halt the entire process.

410-415: Clear success logging.
Reporting the established connections aids debugging and network monitoring.

420-429: Final fallback error handling.
If no connection succeeds, returning fail to connect to any peer is clean and explicit.

p2p/party_coordinator.go (2)

57-57: Stream handler switched to new protocol.
Binding HandleStreamWithLeader to joinPartyProtocolWithLeader aligns with removing leaderless party logic.

164-164: Refined error log message.
Changing the phrasing to “Failed” maintains consistency with typical logging style.

p2p/party_coordinator_test.go (5)

19-21: Disabling test deadline globally.
Calling ApplyDeadline.Store(false) removes automatic time constraints. Confirm that unbounded tests won’t remain stuck in deadlock scenarios.

23-44: Renamed function for clarity.
Using setupHosts instead of setupHostsLocally improves consistency. Ensure other references are updated if any.

105-141: TestNewPartyCoordinator rename.
Renaming from TestPartyCoordinator clarifies that new coordinator functionality is being tested. Good coverage with multiple scenarios.

143-216: TestNewPartyCoordinatorTimeOut improvements.
The extended timeout reduces false negatives. Tests for “leader offline” and “not ready” nodes are well-structured.

218-245: TestGetPeerIDs thoroughly covers edge cases.
Verifying empty inputs and invalid IDs strengthens reliability. Excellent addition to the test suite.

logs/fields.go (1)

12-12: Good addition for latency logging.

Adding a standardized constant for latency metrics will help maintain consistency in logging across the codebase.

messages/version.go (1)

3-3: Improved naming convention for version constant.

The renamed constant VersionJoinPartyWithLeader is more descriptive and follows better Go naming conventions than the previous NEWJOINPARTYVERSION. This change clearly indicates that this version supports only party protocols with a leader, aligning with the PR objective to drop leaderless party support.

keygen/ecdsa/tss_keygen.go (1)

220-220: Consistent error naming convention.

The change from blame.ErrTssTimeOut to blame.ErrTimeoutTSS aligns with the standardized error naming convention being applied across the codebase, improving consistency in error handling.

keysign/ecdsa/tss_keysign.go (1)

291-291: Consistent error naming convention.

The change from blame.ErrTssTimeOut to blame.ErrTimeoutTSS maintains consistency with the standardized error naming convention being applied throughout the codebase.

keysign/eddsa/tss_keysign.go (1)

290-290: Rename timeout error for consistency

The change from blame.ErrTssTimeOut to blame.ErrTimeoutTSS aligns with the error definition in blame/types.go. This improves codebase consistency and maintainability by standardizing error naming conventions.

keygen/request.go (3)

3-8: Import organization improved

Standard library imports are now properly grouped separately from project imports, following Go best practices.

18-18: Documentation clarity improved

The comment update from "creeate a new instance of keygen.Request" to "constructs Request" is more concise and clear.

28-39: Well-implemented message ID generation

The new MsgID() method provides a consistent way to identify requests by hashing their key properties. The implementation is efficient, using a byte buffer for string concatenation and clear separator characters between components.

keysign/request.go (2)

3-10: Import organization improved

Standard library imports are now properly grouped separately from project imports, following Go best practices.

33-44: Well-implemented message ID generation

The MsgID() method provides a consistent way to identify keysign requests through hashing their key properties. This follows the same pattern as in the keygen package, ensuring a uniform approach to request identification across the codebase.

keygen/eddsa/tss_keygen.go (1)

208-208: Rename timeout error for consistency

The change from blame.ErrTssTimeOut to blame.ErrTimeoutTSS aligns with the error definition in blame/types.go. This improves codebase consistency and maintainability by standardizing error naming conventions.

blame/policy.go (3)

81-81: Error variable renamed for consistency.

The error variable has been renamed from ErrTssTimeOut to ErrTimeoutTSS to maintain consistent naming conventions across the codebase.

94-94: Error variable renamed for consistency.

The error variable has been renamed from ErrTssTimeOut to ErrTimeoutTSS for error handling uniformity.

108-108: Error variable renamed for consistency.

The error variable has been renamed from ErrTssTimeOut to ErrTimeoutTSS for consistent error reporting.

tss/tss.go (1)

179-191: Removed leaderless party support.

The method signature and implementation have been simplified by removing the version parameter and associated conditional logic. This aligns with the PR objective to remove the leaderless party protocol (v13) and only support parties with a leader (v14).

monitor/metric.go (2)

39-47: Improved method parameter naming and control flow.

The changes improve code quality in two ways:

Parameter renamed from joinpartyTime to joinPartyTime for proper camelCase convention

Refactored conditional logic to use early return pattern instead of else blocks, which improves readability

49-57: Improved method parameter naming and control flow.

The changes improve code quality in two ways:

Parameter renamed from joinpartyTime to joinPartyTime for proper camelCase convention

Refactored conditional logic to use early return pattern instead of else blocks, which improves readability

tss/keysign.go (7)

69-77: Simplified joinParty call by removing version parameter.

The code has been updated to remove the version parameter from the joinParty call, aligning with the removal of the leaderless party protocol (v13) support.

116-122: Enhanced logging for party join operations.

Added detailed logging that includes the message ID, leader information, and latency metrics when joining a party for keysign operations.

187-195: Enforced leader-based party protocol.

Added explicit version check to ensure only the leader-based party protocol (v14) is used, with proper error messaging for invalid versions.

197-197: Improved request logging.

Enhanced logging by including both the message ID and the full request object, which provides better context for debugging.

203-204: Improved error handling with detailed context.

Errors are now wrapped with more specific context messages, making debugging easier by providing clear information about what operation failed.

260-261: Enhanced error message formatting.

Updated the error message to use the fmt.Errorf with %w verb for proper error wrapping, which maintains the error chain for better error handling downstream.

293-294: Simplified error handling.

Replaced a custom error format with a more direct error message using the standard errors package, making the code more consistent.

tss/keygen.go (12)

8-8: Import addition for improved error handling.

The addition of the errors package from the github.com/pkg/errors library enables more descriptive error wrapping, which enhances error context propagation.

16-16: Import addition for structured logging.

The logs package import enables consistent field names in structured logging, improving log searchability and analysis.

24-26: Enforce leader-based party protocol.

Good version enforcement to ensure only the leader-based party protocol is used, aligning with the PR objective to eliminate leaderless parties.

28-30: Enhanced error context in message ID retrieval.

Using errors.Wrap improves error traceability by maintaining the original error while adding contextual information.

93-101: Improved timeout logging.

The enhanced error logging now includes the precise timeout duration, aiding in diagnosis of join party failures.

118-118: Streamlined blame node creation.

This simplification maintains functionality while improving code readability.

134-134: Enhanced error message clarity.

The updated error message precisely identifies the failure context for debugging.

142-145: Structured logging implementation.

The addition of structured logging with standardized field names (MsgID and Latency) improves observability and aligns with best practices.

165-166: Code simplification for status initialization.

Direct assignment of the success status improves readability.

187-187: Streamlined blame retrieval.

Direct use of the blame manager's GetBlame method simplifies the code.

198-200: Consistent error handling pattern.

This change maintains consistency with the error handling approach used in the Keygen method.

298-298: Consistent error message.

This change aligns the error message with the one used in the Keygen method, maintaining consistency across the codebase.

tss/tss_4nodes_test.go (12)

21-21: Import addition for version constants.

Using the centralized version constants from the messages package improves maintainability by defining version values in a single location.

36-36: Single version enforcement.

This change eliminates multiple version constants in favor of a single version from the messages package, aligning with the PR objective to remove leaderless party support.

87-87: Consistent naming convention.

Updated function call to match the renamed function with proper camel case for "PreParams".

91-91: Optimized timeout duration.

Reducing the key sign timeout from 90 to 20 seconds improves test efficiency while maintaining sufficient time for operations to complete.

101-101: Simplified server initialization.

The removal of conditional initialization logic streamlines the code by using a single consistent approach.

124-128: Simplified version testing.

By removing the version loop and using the single defined party version, the test becomes more focused on validating the leader-based protocol.

235-240: Clarified test logic with explicit comments.

The added comments make it clear that the first node is intentionally skipped to test failure scenarios.

256-260: Explicit failure expectations.

The addition of constants for expected failure node and count improves test readability and maintenance.

264-265: Updated assertions for new expectations.

The assertions now correctly validate against the defined constants for failure nodes.

371-374: Enhanced structured logging setup.

The revised logger initialization provides better context by including node index and port information consistently.

382-382: Logger parameter update.

This change ensures the enhanced logger is properly passed to the new server instance.

390-390: Function rename for consistent naming convention.

The function name has been updated to follow proper camel case for "PreParams", improving naming consistency across the codebase.

lumtis

zeta-chain/node#3743 doesn't include this changge yet right?

We can already prepare a node PR even if this one is not reveiewed first to see if the E2E passes.

On the suggestion for Nil -> NoError. I don't know if Assert.NoError exsits but if it doesn't imo we should switch require.NoError, I don't see the idea of doing subsequent assertions if a function not supposed to fail does.

monitor/metric.go

p2p/party_coordinator_test.go

tss/keygen.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

keygen/keygen.go (1)
39-50: Consider adding validation in MsgID method

The MsgID method creates a deterministic hash based on request parameters, which is essential for request tracking. However, it lacks validation for empty or nil Keys slice.

Consider adding validation checks:
func (r *Request) MsgID() (string, error) {
+	if len(r.Keys) == 0 {
+		return "", errors.New("empty keys list")
+	}
	var b bytes.Buffer

	b.WriteString(r.Version)
	b.WriteByte(';')
	b.WriteString(string(r.Algo))
	b.WriteByte(';')
	b.WriteString(strings.Join(r.Keys, ","))

	return common.MsgToHashString(b.Bytes())
}
p2p/party_coordinator_test.go (3)
45-73: Function correctly tests leader-last scenario, but contains a misleading comment.

The implementation correctly tests the case where non-leader nodes join before the leader, but there's a misleading comment.
-    // we start the leader firstly
+    // we start the leader last
75-101: Function name contains a typo.

The function name has a typographical error that should be fixed for consistency.
-func leaderAppersFirstTest(t *testing.T, msgID string, peers []string, pcs []*PartyCoordinator) {
+func leaderAppearsFirstTest(t *testing.T, msgID string, peers []string, pcs []*PartyCoordinator) {
Don't forget to update the function call in TestNewPartyCoordinator as well.

138-139: Function call needs to be updated to match the corrected function name.

The typo in the function name needs to be fixed in the function call as well.
-    leaderAppersFirstTest(t, msgID, peers, pcs)
+    leaderAppearsFirstTest(t, msgID, peers, pcs)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 735dbc5 and 269c679.

📒 Files selected for processing (8)

keygen/keygen.go (1 hunks)
keygen/request.go (0 hunks)
keygen/response.go (0 hunks)
keysign/signature_notifier_test.go (5 hunks)
monitor/metric_test.go (2 hunks)
p2p/party_coordinator_test.go (3 hunks)
tss/keygen.go (9 hunks)
tss/tss_4nodes_test.go (9 hunks)

💤 Files with no reviewable changes (2)

keygen/response.go
keygen/request.go

🚧 Files skipped from review as they are similar to previous changes (1)

tss/keygen.go

🧰 Additional context used

🧬 Code Definitions (4)

monitor/metric_test.go (1)

monitor/metric.go (6) (6)

m (21-28)

m (30-37)

m (39-47)

m (49-57)

m (59-66)

Metric (11-19)

keygen/keygen.go (4)

common/types.go (1) (1)

Algo (20-20)

tss/keysign.go (1) (1)

algo (200-200)

common/tss_helper.go (1) (1)

MsgToHashString (54-64)

blame/types.go (1) (1)

Blame (41-47)

tss/tss_4nodes_test.go (7)

messages/version.go (1) (1)

VersionJoinPartyWithLeader (3-3)

tss/tss_4nodes_zeta_test.go (9) (9)

s (43-86)

s (88-95)

s (101-107)

s (110-134)

s (157-180)

s (182-221)

s (223-230)

s (232-267)

_ (40-40)

tss/tss.go (1) (1)

Server (40-54)

common/types.go (1) (1)

TssConfig (7-18)

tss/keysign.go (1) (1)

algo (200-200)

keysign/response.go (1) (1)

Response (17-21)

blame/types.go (1) (1)

Blame (41-47)

p2p/party_coordinator_test.go (5)

p2p/stream_helper.go (2) (2)

init (28-30)

ApplyDeadline (26-26)

p2p/party_coordinator.go (18) (18)

PartyCoordinator (30-38)

wg (212-212)

wg (298-298)

NewPartyCoordinator (41-59)

pc (62-66)

pc (68-108)

pc (110-132)

pc (135-172)

pc (174-178)

pc (180-191)

pc (193-203)

pc (205-226)

pc (228-242)

pc (244-286)

pc (288-376)

pc (378-426)

pc (428-465)

pc (467-469)

p2p/communication.go (2) (2)

wg (229-229)

timeout (165-165)

p2p/communication_test.go (1) (1)

_ (19-19)

p2p/leader_provider.go (1) (1)

LeaderNode (12-26)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: test go 1.22

🔇 Additional comments (23)

monitor/metric_test.go (1)

34-34: Improved test diagnostics with assert.NoError

The change from assert.Nil(t, err) to assert.NoError(t, err) across all error assertions improves test clarity and diagnostics. The NoError assertion provides more descriptive failure messages that explicitly indicate when an error occurs, making test failures easier to diagnose.

This change also aligns with Go testing best practices, where semantically clear assertions are preferred over generic nil checks for errors.

Also applies to: 37-37, 42-42, 61-61, 64-64

keysign/signature_notifier_test.go (1)

31-33: Enhanced error assertion clarity

Converting assert.Nil(t, err) to assert.NoError(t, err) across all error assertions improves test output quality. The NoError assertion produces more informative error messages that include the actual error content when failures occur, facilitating faster debugging.

This consistent change throughout the test file maintains coherent testing patterns and aligns with modern Go testing conventions.

Also applies to: 74-78, 85-89, 92-92, 104-106, 147-151, 156-160, 172-172

keygen/keygen.go (5)

14-19: Well-designed Service interface

The new Service interface provides a clear contract for key generation functionality with better type safety by returning *crypto.ECPoint instead of the previous return type. This change aligns with the PR objective to enforce a more structured party protocol.

21-27: Clean Request structure definition

The Request struct encapsulates all necessary parameters for key generation in a well-organized manner. The inclusion of the Algo field provides flexibility for supporting different cryptographic algorithms.

29-37: Constructor follows best practices

The NewRequest constructor function properly initializes all fields of the Request struct, making client code more readable and less error-prone.

52-59: Well-structured Response type

The Response struct effectively captures all necessary information about the result of a key generation operation, including algorithm used, public key, pool address, status, and blame information for fault attribution.

61-70: Clean Response constructor

The NewResponse constructor function properly initializes all fields of the Response struct, maintaining consistency with the Request constructor pattern.
tss/tss_4nodes_test.go (10)

21-21: Good - Added import for messages package

This change appropriately adds the import for the messages package which is needed for accessing the version constants used in the tests.

36-36: Simplified version management

Good refactoring to consolidate the version management by using a single version constant that points to the standardized leader-based party protocol version. This eliminates redundancy and aligns with the PR's objective of enforcing a party structure with a leader.

87-87: Standardized function naming convention

The function name has been correctly updated from getPreparams to getPreParams to maintain consistent camel case capitalization throughout the codebase.

101-101: Improved server initialization

The server initialization has been simplified by removing conditional logic, resulting in cleaner code.

125-128: Streamlined test implementation

The test implementation has been streamlined to use the single partyVersion constant instead of iterating over multiple versions, aligning with the PR's objective of dropping support for leaderless party protocol.

235-239: Enhanced test readability for failure scenario

The loop has been improved to use range syntax as suggested in the previous review, and the commented explanation enhances readability by making it explicit that the first node is being skipped.

257-260: Improved test clarity with named constants

Good improvement to define expected failure node and count as constants rather than embedding these values directly in the assertions. This enhances readability and maintainability.

264-265: Updated assertions to match new requirements

The assertions have been appropriately updated to expect the specific number of blame nodes defined by the constant. This aligns with the changes in the party protocol implementation.

390-390: Standardized function naming convention

The function name has been correctly updated from getPreparams to getPreParams to maintain consistent camel case capitalization throughout the codebase, matching the reference at line 87.

91-91:
❓ Verification inconclusive

Optimized key signing timeout

The KeySignTimeout has been reduced from 90 seconds to 20 seconds, which is appropriate for test efficiency. Please verify that 20 seconds is still sufficient for reliable test execution in all environments.

🏁 Script executed:
#!/bin/bash
# Check if there are any test failures related to timeouts in the test logs or CI history

git log -n 20 --grep="timeout" --grep="test fail" --grep="KeySignTimeout" --pretty=format:"%h %s"
grep -r "timeout.*key.*sign" --include="*.log" .
Length of output: 50
Optimized Key Signing Timeout – Verification Needed

The updated KeySignTimeout value of 20 seconds appears to enhance test efficiency. Preliminary checks (using grep on recent logs) did not reveal any timeout-related failures; however, the lack of output from the automated log search means the evidence is inconclusive. Please manually verify in your CI and testing environments—especially on slower systems—that a 20-second timeout is indeed sufficient.
p2p/party_coordinator_test.go (6)

20-22: Appropriate disabling of deadlines for testing.

Setting ApplyDeadline to false in the init function ensures that network timeouts don't interfere with test execution, which is a good practice for unit testing network components.

24-43: Improved test setup with better error handling.

The renamed setupHosts function now uses require.NoError for critical assertions that should halt the test on failure, and it has better readability with the range-based loop. The function also properly connects the peers for testing.

103-140: Test refactored for better organization with dedicated test helper functions.

The test now properly tests both scenarios (leader first and leader last) using dedicated helper functions, which improves readability and maintainability. The leader detection and sorting logic is well implemented.

142-215: Improved timeout testing with specific error assertions.

The test now properly uses assert.ErrorIs for specific error type checking, which is more robust than general error assertions. The timeout has been increased to 3 seconds, which should provide more stability in CI environments.

189-189: Proper error type assertion.

Using assert.ErrorIs is the correct approach for checking specific error types.

217-243: Good unit test for getPeerIDs with proper edge cases.

The test properly covers various input scenarios for the getPeerIDs method:

Empty list of peer IDs

Valid peer ID

Invalid peer ID format

This comprehensive testing ensures the method works correctly under all expected conditions.

swift1337 · 2025-03-20T12:41:49Z

Localnet E2E

setup        | ✅ e2e tests completed in 6m11.288911369s
setup        |  ---📈 Network Report ---
setup        | Block Height:           212
setup        | CCTX Processed:         83
setup        | Emissions Pool Balance: 19997989ZETA
e2e passed

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tss/keysign.go (1)
260-260: Error handling is inconsistent.

The error handling uses fmt.Errorf instead of errors.Wrap which is used elsewhere. Consider using errors.Wrap for consistency across the codebase.
-		return keysign.Response{}, fmt.Errorf("fail to get local keygen state: %w", err)
+		return keysign.Response{}, errors.Wrap(err, "fail to get local keygen state")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 269c679 and 49b0d56.

📒 Files selected for processing (1)

tss/keysign.go (7 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

tss/keysign.go (4)

keysign/response.go (1) (1)

Response (17-21)

messages/version.go (1) (1)

VersionJoinPartyWithLeader (3-3)

keysign/ecdsa/tss_keysign.go (1) (1)

TssKeySign (31-39)

keysign/keysign.go (1) (1)

TssKeySign (11-19)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: test go 1.22

🔇 Additional comments (10)

tss/keysign.go (10)

16-16: Enhancement: Using pkg/errors for improved error handling.

Using the errors package from pkg/errors provides better error handling capabilities with contextual information wrapping.

69-76: Implementation aligns with PR objectives by enforcing leader-based party protocol.

The code has been simplified by removing the leaderless party logic, aligning with the PR objective to drop the leaderless party protocol (v13) and enforce a party structure with a leader (v14).

78-121: Error handling improved for join party failures.

The error handling and logging for join party failures has been streamlined while maintaining the important functionality of blaming the leader and broadcasting failures.

116-121: Enhanced logging for party joining process.

The logging now includes the message ID, leader information, and latency metrics, which provides better observability for debugging and monitoring the key signing process.

154-158: Simplified signature data handling.

The code has been streamlined to focus only on error handling for the sign message process, removing unnecessary complexity.

187-190: Enforcing party with a leader is a security enhancement.

This explicit version check enforces the use of messages.VersionJoinPartyWithLeader, providing a clear error message if an incompatible version is used. This enhances protocol security and compatibility management.

192-194: Error wrapping enhances debuggability.

Using errors.Wrap provides additional context to errors, which improves the ability to debug issues when they occur.

197-197: Enhanced logging improves observability.

Including the message ID and embedding the request object in the log provides more context for debugging and monitoring the keysign operation.

203-203: Error wrapping improves error context.

Using errors.Wrap to add context to the error about unmarshalling the pool pub key. This provides better error information for debugging.

293-293: Error handling simplified.

The error handling for threshold calculation failure has been simplified while maintaining the essential error information.

swift1337 added 8 commits March 19, 2025 14:08

Refactor request msg_id

f6dece2

Drop leaderless parties

7c7aeeb

Fix naming

d8c0023

Remove sorting

91879ea

Enforce version, improve logging

0a63b4d

Fix tests

8e18094

Remove dial to self

2607a94

Drop legacy proto

4060a10

swift1337 requested a review from gartnera March 19, 2025 19:31

swift1337 self-assigned this Mar 19, 2025

swift1337 requested a review from a team as a code owner March 19, 2025 19:31

Lint

735dbc5

gartnera reviewed Mar 19, 2025

View reviewed changes

tss/keygen.go Show resolved Hide resolved

tss/keygen.go Show resolved Hide resolved

tss/keysign.go Show resolved Hide resolved

tss/keysign.go Show resolved Hide resolved

tss/tss_4nodes_test.go Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 19, 2025

View reviewed changes

lumtis reviewed Mar 20, 2025

View reviewed changes

swift1337 added 3 commits March 20, 2025 12:40

Merge files

0a99e2b

Address PR comments [1]

f572fc7

Address PR comments [2]

269c679

coderabbitai bot reviewed Mar 20, 2025

View reviewed changes

Fix log

49b0d56

lumtis approved these changes Mar 20, 2025

View reviewed changes

coderabbitai bot reviewed Mar 20, 2025

View reviewed changes

swift1337 merged commit 4fc8855 into master Mar 20, 2025
6 checks passed

swift1337 deleted the feat/drop-leaderless-party branch March 20, 2025 12:51

feat: drop leaderless party #52

feat: drop leaderless party #52

Uh oh!

Conversation

swift1337 commented Mar 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

lumtis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

swift1337 commented Mar 20, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

swift1337 commented Mar 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)