(minor improvement): small refactoring in make_token_replica_map() #513

mykaul · 2025-08-07T09:14:46Z

While trying to look at some random (flaky?) test (#510 (comment) ) I saw some (minor) improvements that can be made to make_token_replica_map():

Remove some redundant len() calls to outside the loop(s)
Align some variable names, start with num_ ... for them.
Move token_offset and host assignment within the loop to closer to where it's used.
Only add hosts that are in the dc_rf_map to begin with.

All those are probably very very minor improvements, perhaps in a large cluster it'll be noticable.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

dkropachev · 2025-08-07T17:52:44Z

@mykaul , let me know when you done, it looks good as is.

mykaul · 2025-08-07T18:21:28Z

@mykaul , let me know when you done, it looks good as is.

I'm not, I'd like to understand the code better. I think it can be further slightly improved.

While trying to look at some random (flaky?) test (scylladb#510 (comment) ) I saw some (minor) improvements that can be made to make_token_replica_map(): 1. Remove some redundant len() calls to outside the loop(s) 2. Align some variable names, start with num_ ... for them. 3. Move token_offset and host assignment within the loop to closer to where it's used. 4. Only add DCs and hosts that are in the map All those are probably very very minor improvements, perhaps in a large cluster it'll be noticable. Signed-off-by: Yaniv Kaul <[email protected]>

mykaul · 2025-08-08T06:25:16Z

cassandra/metadata.py

-            if host.datacenter and host.rack:
-                dc_racks[host.datacenter].add(host.rack)
-                hosts_per_dc[host.datacenter].add(host)
+            host_dc = host.datacenter


@dkropachev - I've added this small change.

Copilot

Pull Request Overview

This PR refactors the make_token_replica_map() method in cassandra/metadata.py to improve performance and code clarity through several optimizations:

Pre-filter hosts to only process those in datacenters with replication factors
Move length calculations outside loops to avoid redundant computations
Improve variable naming consistency with num_ prefixes
Reorganize code structure for better locality of use

Copilot · 2025-08-08T06:26:25Z

cassandra/metadata.py

-                    if token_offset_index >= len(token_offsets):
-                        token_offset_index = token_offset_index - len(token_offsets)
+                    if replicas_remaining == 0 or num_replicas_this_dc == num_hosts_this_dc:
+                        break


Moving this check before the token_offset calculation creates a logic error. The original code checked this condition after getting the host, which ensures we don't break the loop prematurely when we haven't actually found enough unique hosts yet. This change could cause the algorithm to stop searching for replicas too early.

Suggested change

break

I don't understand this. The whole point is to optionall save useless assignment of token_offset_index if it's not needed in the cycle - and I assume at some cycle it wouldn't be.

Copilot · 2025-08-08T06:26:25Z

cassandra/metadata.py

-                dc_racks[host.datacenter].add(host.rack)
-                hosts_per_dc[host.datacenter].add(host)
+            host_dc = host.datacenter
+            if host_dc in dc_rf_map:


This filtering logic is good for performance, but it changes the behavior of the original code. The original code populated dc_to_token_offset for all datacenters, then filtered later. This early filtering means dc_to_token_offset will only contain DCs that are in dc_rf_map, which could affect downstream logic that expects all DCs to be present.

Copilot means that your change may affect other places that may assume that dc_to_token_offset isn't filtered.
I guess it can't check it by "himself" as part of review.

The whole point is to skip adding what will be later skipped (see deleted lines 647-648 below)

mykaul · 2025-08-08T15:57:56Z

@scylladb/python-driver-maint - please re-review. I hope I did not mess things up - CoPilot is being silly here, I suspect. I'm not sure why the whole CI is broken, doesn't seem to be caused by my changes.
Please review carefully, I don't think we have enough tests for this part, though it seems OK. Perhaps I can just do the removal of the len() stuff outside the loop. It's low impact anyway.

mykaul requested a review from Copilot August 7, 2025 09:14

mykaul marked this pull request as draft August 7, 2025 09:14

This comment was marked as outdated.

Sign in to view

dkropachev approved these changes Aug 7, 2025

View reviewed changes

dkropachev marked this pull request as ready for review August 7, 2025 17:52

mykaul force-pushed the make_token_replica_map branch from cd57c4d to 64c0e38 Compare August 8, 2025 06:23

mykaul commented Aug 8, 2025

View reviewed changes

mykaul requested a review from Copilot August 8, 2025 06:25

Copilot AI reviewed Aug 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(minor improvement): small refactoring in make_token_replica_map() #513

(minor improvement): small refactoring in make_token_replica_map() #513

mykaul commented Aug 7, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

dkropachev commented Aug 7, 2025

Uh oh!

mykaul commented Aug 7, 2025

Uh oh!

mykaul Aug 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

mykaul Aug 8, 2025

Uh oh!

mykaul Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

mykaul Aug 8, 2025

Uh oh!

roydahan Aug 8, 2025

Uh oh!

mykaul Aug 8, 2025

Uh oh!

mykaul commented Aug 8, 2025

Uh oh!

Uh oh!

(minor improvement): small refactoring in make_token_replica_map() #513

Are you sure you want to change the base?

(minor improvement): small refactoring in make_token_replica_map() #513

Conversation

mykaul commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

dkropachev commented Aug 7, 2025

Uh oh!

mykaul commented Aug 7, 2025

Uh oh!

mykaul Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

mykaul Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

mykaul Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

mykaul Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

roydahan Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

mykaul Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

mykaul commented Aug 8, 2025

Uh oh!

Uh oh!

mykaul commented Aug 7, 2025 •

edited

Loading