Skip to content

FE HA validation framework #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

FE HA validation framework #22

wants to merge 1 commit into from

Conversation

dandrushko
Copy link
Contributor

No description provided.

@dandrushko dandrushko requested review from tinova and balazsbme July 14, 2025 07:40
@@ -0,0 +1,6 @@
---
#################### TEST CONFIGURATION VARIABLES ####################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this parameter and follow the decision we made after Bruno's feedback to create a hierarchy of config params.
See the docs here: https://github.com/OpenNebula/one-infra/wiki/Development-and-Testing-Post%E2%80%90deployment-Validation-tools#variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss to which extend this is applicable for validation framework and where we should store role specific variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, lets discuss and conclude in chat.

I think the role specific variables can also follow the same hierarchy. And a control flag for skip/execute. As I understood we agreed on this decision with this PR #20

- name: Wait for leader failover
pause:
seconds: 20
prompt: "Waiting for OpenNebula to elect a new leader. Press Enter to continue after 1 minutes, or Ctrl+C to abort."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not wait for user input here. Rather look for some log message or zone state (or check leader selection state directly somehow) and wait with a specified timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, make sense. Will remove prompt from here and will wait for 20 seconds

pause:
seconds: 20
prompt: "Waiting for OpenNebula to elect a new leader. Press Enter to continue after 1 minutes, or Ctrl+C to abort."
run_once: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_once should not be used, as recommended by Michal in one-deploy coding style: https://github.com/OpenNebula/one-deploy/wiki/code_style#4-be-careful-with-run_once

Due to this I am using also in the connectivity matrix a logic to make sure it only runs once in the first frontend. In this case maybe we have to find another option, because we might have different behaviour if we happen to run the leader status checking on the "initial_leader" vs. any other one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit another case, i.e. we don't have parallel operation here, but have to get output of the onezone command from one of the nodes. Let's discuss if you see any risk here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, as I understand the run_once will just run on random node of the group. And the test would anyway fail earlier if we do not run these on the FE nodes... So I guess it is fine, I would prefer if we have the same approach for run_once, but we can live with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we include this in the main "validation.yml" so it will be run together with the other tests? In that case we can use also the same HTML report, no need for a new file just for this testcase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Majority of the latest deployment is non-FA, so IMO this roles should be conditionally included only for HA deployments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then we should figure out what should be the condition for it. I think still the easiest would be to also include this test casse under "validation.fe_ha" for the params and control it with "validation.run_fe_ha" that would be pretty consistent, and we can disable it by default.

#when: hostvars[groups[frontend_group | d('frontend')][1]]['ansible_host'] == ansible_host
run_once: true

- name: Save a new Leader node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use leader with downcase for consistency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants