-
Notifications
You must be signed in to change notification settings - Fork 292
Improve other_operation_in_progress
error users, fix handling of its delays
#6588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Improve other_operation_in_progress
error users, fix handling of its delays
#6588
Conversation
…ion_in_progress error Compare before, where it's unclear which operation is precluding progress: ``` Caught exception while marking SR for VDI.clone in message forwarder: OTHER_OPERATION_IN_PROGRESS: [ SR; OpaqueRef:d019f26a-b2e8-529b-bb66-fd4f008d4f82; VDI.clone ] ``` And after, where the ID of the operation makes it easier to navigate the logs: ``` Caught exception while marking SR for VDI.clone in message forwarder: OTHER_OPERATION_IN_PROGRESS: [ SR; OpaqueRef:b0f54a40-485f-f48f-fdc7-6db28a64d3fd; scan; OpaqueRef:832fb22d-2ce5-0d26-8a00-a4165e78b34d ] ``` Also, avoid iterating over all the current_operations when looking for non-parallelisable ops, stop on the first one. Report the non-parallelisable op instead of the first current operation as the reason for the error as well. Signed-off-by: Andrii Sultanov <[email protected]>
b122c68
to
b332213
Compare
set_errors Api_errors.other_operation_in_progress | ||
["VBD"; _ref; vbd_operations_to_string (List.hd current_ops)] | ||
[`unpause] ; | ||
let set_difference a b = List.filter (fun (_, x) -> not (List.mem x b)) a in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listext.List.set_difference
is used further up ahead: you could replace all users, or keep using listext here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a reimplementation of set_difference
for this particular use case - I need to ignore the reference, only comparing operation types.
@@ -277,7 +282,15 @@ let check_operation_error ~__context ?sr_records:_ ?(pbd_records = []) | |||
mechanism of message forwarding and only use the event loop. *) | |||
my_has_current_operation_vbd_records <> [] && op <> `data_destroy | |||
then | |||
Error (Api_errors.other_operation_in_progress, ["VDI"; _ref]) | |||
let op_ref, op_type = | |||
List.hd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two more list.hd, a bit awkward that is has to be coded this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but sadly replacing this with match
over options as elsewhere makes the logic more inscrutable, so I've kept it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little big, I haven't gone through it.
other_operation_in_progress has separate fields for the main class of the object ("Cluster" or "Cluster_host") and the operation that's blocking progress, do not concatenate these into one string. Also report the task ref that's blocking progress. Signed-off-by: Andrii Sultanov <[email protected]>
other_operation_in_progress has separate fields for the main class of the object ("VM") and the operation that's blocking progress, do not concatenate these into one string. Also report the task ref that's blocking progress. Signed-off-by: Andrii Sultanov <[email protected]>
…n_in_progress Fix incorrect operations being reported - e.g. when other_operation_in_progress is reported only when there are non-parallelisable operations currently happening, we can't just report the first current op, we need to report the first *non-parallelisable* op. This required some refactoring, moving to options instead of bools, etc. Additionally report the blocking operation's reference when raising the error. Signed-off-by: Andrii Sultanov <[email protected]>
…_in_progress Signed-off-by: Andrii Sultanov <[email protected]>
…other_operation_in_progress Instead of logging all the blocking operations with 'debug', just report them with the error. Signed-off-by: Andrii Sultanov <[email protected]>
…progress Signed-off-by: Andrii Sultanov <[email protected]>
…rogress error Signed-off-by: Andrii Sultanov <[email protected]>
…rror Signed-off-by: Andrii Sultanov <[email protected]>
…ation_in_progress error Signed-off-by: Andrii Sultanov <[email protected]>
…gress error Signed-off-by: Andrii Sultanov <[email protected]>
…on_in_progress error Signed-off-by: Andrii Sultanov <[email protected]>
Not all of other_operation_in_progress error contain two objects in the list ([cls; objref]), some contain three or four, but the match would miss them and they would be put into a non-interruptible Thread.sleep instead of the interruptible Delay.sleep that provides a mechanism for the other task to wake us up on its way out. Signed-off-by: Andrii Sultanov <[email protected]>
…progress Most of the users report these when appropriate, so document them here. Signed-off-by: Andrii Sultanov <[email protected]>
quality-gate: Reduce expected List.hd count test_clustering: test for the blocking operation to be reported Signed-off-by: Andrii Sultanov <[email protected]>
b332213
to
9be1b7d
Compare
other_operation_in_progress
is reported inconsistently, with some users only specifying[class; class_ref]
, and others alsoop_type
. Make it consistent, changing all users to report[class; class_ref; op_type; op_ref]
. This makes it much easier to navigate the logs, since the operation blocking progress is clearly referenced.Compare before, where it's unclear which operation is precluding progress:
And after, with the ID of the operation present:
In addition, fix incorrect operation reporting, so that when progress is blocked only when particular types of operations are present in
current_operations
, these specific operations are reported, and not just any current operation.Also fix handling of
other_operation_in_progress
delays - some of the users would miss thematch
arm and use the non-interruptibleThread.sleep
instead of the interruptibleDelay.sleep
that provides a mechanism for the blocking task to wake us up on its way out.