-
Notifications
You must be signed in to change notification settings - Fork 1
FE HA validation framework #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
fde4816
4ca8727
cb0c3a9
2fab036
a8b91d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
--- | ||
all: | ||
vars: | ||
ansible_user: root | ||
one_version: '7.0' | ||
one_pass: opennebula | ||
ee_token: 'ci:Pantufl4.' | ||
one_vip: 172.20.0.100 | ||
one_vip_cidr: 24 | ||
one_vip_if: eth0 | ||
ds: | ||
mode: ssh | ||
|
||
infra: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the purpose of this section? and the vars under the frontend? Will it not interfere with testing in a particular microenv locally? for example, if I would use this same inventory file to test the VM HA, it will overwrite many things in that environment? In tat case maybe we could create a new folder under "inventory" for this testcase (to contain an example of how to configure it and also how to test it locally with a microenv) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a local invenotry used for testing. Should be ommitted from PR |
||
vars: | ||
os_image_url: https://d24fmfybwxpuhu.cloudfront.net/ubuntu2404-6.10.0-2-20240710.qcow2 | ||
os_image_size: 30G | ||
vcpu_static: 2 | ||
memory_MiB: 4096 | ||
ansible_user: root | ||
infra_bridge: brpub | ||
hosts: | ||
node1: | ||
ansible_host: 172.20.0.1 | ||
|
||
frontend: | ||
hosts: | ||
fe01: | ||
ansible_host: 172.20.0.10 | ||
infra_hostname: 'node1' | ||
fe02: | ||
ansible_host: 172.20.0.11 | ||
infra_hostname: 'node1' | ||
fe03: | ||
ansible_host: 172.20.0.12 | ||
infra_hostname: 'node1' | ||
vars: | ||
context: | ||
ETH0_IP: "{{ ansible_host }}" | ||
PASSWORD: 'OpenNebula' | ||
ETH0_GATEWAY: 172.20.0.1 | ||
ETH0_NETWORK: 172.20.0.0 | ||
ETH0_MASK: 255.255.255.0 | ||
ETH0_DNS: 172.20.0.1 | ||
SSH_PUBLIC_KEY: | | ||
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCkKbQKRj3FC++IBl9U1ZuLXaMBPRWr7HDY6kyHyMgQKaRZ0QTFkA9ADMwcrNu4H2xILN626r6qFTrc4zYpti0U/ps7cyodt50kqjFiaueB1gVbpPvP9mUjVk8NNXNmZDwgtYXaQDMCx14JfHn8mWgxKlwBCMlSRfOyJQx5EGpfzX/FoozRYm+mrUPt8LP+QFPVQMJj45q4Jnv2qWMwuJw7ZNjwnkFEaBLtPjpJTbxRVFeiBxVEwWcsqhKyRdvSDcZAMKoVQETKOw9bBY91sdycl+R+OoljQEa0WyBNO4WcDTc7mosohpj6o5mwybyp91PP88ZxJ4LUA1SYCXn3qBa9 | ||
|
||
|
||
node: | ||
hosts: | ||
node1: { ansible_host: 172.20.0.1 } |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
|
||
- hosts: "{{ frontend_group | d('frontend') }}" | ||
roles: | ||
- role: fe_ha | ||
when: validation.run_ha_verifications == true | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
--- | ||
#################### TEST CONFIGURATION VARIABLES #################### | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's move this parameter and follow the decision we made after Bruno's feedback to create a hierarchy of config params. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's discuss to which extend this is applicable for validation framework and where we should store role specific variables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay, lets discuss and conclude in chat. I think the role specific variables can also follow the same hierarchy. And a control flag for skip/execute. As I understood we agreed on this decision with this PR #20 |
||
# Zone name | ||
|
||
one_zone_name: OpenNebula | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
--- | ||
|
||
- setup: | ||
gather_subset: | ||
- min | ||
|
||
- name: Install required task dependencies | ||
ansible.builtin.package: | ||
name: | ||
- jq | ||
state: present | ||
|
||
# Check VIP reachability from all the FE nodes | ||
- name: Check connection to API through a VIP from all FEs | ||
ansible.builtin.wait_for: | ||
host: "{{ one_vip }}" | ||
port: 2633 | ||
register: vip_reachability | ||
|
||
- name: Verify VIP reachability from all FEs | ||
debug: | ||
msg: "{{ vip_reachability.state }}" | ||
|
||
- name: Save VIP reachability from all FEs | ||
set_fact: | ||
verification_result: "{{ (verification_result | default({})) | combine({'VIP connectivity from all FEs': 'ok'}) }}" | ||
|
||
|
||
# Get zone details from the first FE, Assuming state set to 3 is the leader state | ||
- name: Get Zone | ||
ansible.builtin.shell: | ||
cmd: onezone show "{{ one_zone_name }}" -j | jq -r '.ZONE.SERVER_POOL.SERVER | .[] |select(.STATE == "3") | .NAME' | ||
tinova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
register: zone_details | ||
ignore_errors: true | ||
when: hostvars[groups[frontend_group | d('frontend')][0]]['ansible_host'] == ansible_host | ||
run_once: true | ||
|
||
# Test to verify that content of /etc/one directory is the same at all FE nodes | ||
- name: Check content of the config directories | ||
find: | ||
path: "{{ item }}" | ||
file_type: file | ||
recurse: true | ||
register: found_files | ||
loop: "{{ validation.one_config_path }}" | ||
|
||
- name: Set combined list | ||
set_fact: | ||
combined_file_list: "{{ combined_file_list | default([]) + item.files }}" | ||
loop: "{{ found_files.results }}" | ||
|
||
- name: Calculate sha256 sum for each file in the directory | ||
stat: | ||
path: "{{ item.path }}" | ||
checksum_algorithm: sha256 | ||
loop: "{{ combined_file_list }}" | ||
register: file_hashes | ||
|
||
- name: Save hashes per host | ||
set_fact: | ||
file_checksums: "{{ file_checksums | default({}) | combine({ item.item.path: item.stat.checksum}) }}" | ||
loop: "{{ file_hashes.results }}" | ||
when: | ||
- item.stat is defined | ||
- item.stat.exists is defined and item.stat.exists | ||
- item.stat.isreg is defined and item.stat.isreg | ||
- item.stat.checksum is defined | ||
|
||
# Compare files using fe1 as a reference | ||
- name: Set fact for diff files | ||
set_fact: | ||
diff_files: | | ||
{% for fname in hostvars[groups['frontend'][0]]['file_checksums'] %} | ||
{% if hostvars[groups['frontend'][0]]['file_checksums'][fname] != hostvars[item]['file_checksums'][fname] %} | ||
{{ (diff_files_dict | default({})) | combine({ item : fname }) }} | ||
{% endif %} | ||
{% endfor %} | ||
loop: "{{ groups.frontend }}" | ||
run_once: true | ||
|
||
- name: Save /etc/one content checks | ||
set_fact: | ||
verification_result: "{{ verification_result | combine({'Check Content /etc/one diretory for file diffs at all FE nodes. Diff files': diff_files }) }}" | ||
run_once: true | ||
|
||
### Leader failover tests | ||
# | ||
- name: Set initial leader node | ||
set_fact: | ||
initial_leader: "{{ zone_details.stdout }}" | ||
run_once: true | ||
|
||
- name: Display current leder node | ||
debug: | ||
msg: "Current leader node is {{ initial_leader }}" | ||
|
||
- name: Save Initial Leader node | ||
set_fact: | ||
verification_result: "{{ verification_result | combine({'Initial FE leader node': initial_leader }) }}" | ||
|
||
|
||
- name: Stop OpenNebula oned service on the current leader to simulate failure | ||
systemd: | ||
name: opennebula | ||
state: stopped | ||
delegate_to: "{{ initial_leader }}" # Stop service on the identified leader | ||
#when: ansible_host == initial_leader | ||
run_once: true | ||
|
||
# Give OpenNebula's internal HA mechanism time to elect a new leader | ||
- name: Wait for leader failover | ||
pause: | ||
seconds: 20 | ||
run_once: true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. run_once should not be used, as recommended by Michal in one-deploy coding style: https://github.com/OpenNebula/one-deploy/wiki/code_style#4-be-careful-with-run_once Due to this I am using also in the connectivity matrix a logic to make sure it only runs once in the first frontend. In this case maybe we have to find another option, because we might have different behaviour if we happen to run the leader status checking on the "initial_leader" vs. any other one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a bit another case, i.e. we don't have parallel operation here, but have to get output of the onezone command from one of the nodes. Let's discuss if you see any risk here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, as I understand the run_once will just run on random node of the group. And the test would anyway fail earlier if we do not run these on the FE nodes... So I guess it is fine, I would prefer if we have the same approach for run_once, but we can live with it. |
||
|
||
- name: Check that one VIP was migrated and reachable from all FE nodes | ||
ansible.builtin.wait_for: | ||
host: "{{ one_vip }}" | ||
port: 2633 | ||
register: vip_reachability | ||
|
||
- name: Save VIP reachability post migration | ||
set_fact: | ||
verification_result: "{{ verification_result | combine({'VIP reachability after leader migration': 'ok' }) }}" | ||
|
||
- name: Verify VIP reachability from all FEs | ||
debug: | ||
msg: "{{ vip_reachability.state }}" | ||
|
||
|
||
- name: Get a new leader id | ||
ansible.builtin.shell: | ||
cmd: onezone show "{{ one_zone_name }}" -j | jq -r '.ZONE.SERVER_POOL.SERVER | .[] |select(.STATE == "3") | .NAME' | ||
register: migrated_leader | ||
ignore_errors: true | ||
when: initial_leader != ansible_host | ||
run_once: true | ||
|
||
- name: Set a new leader node | ||
set_fact: | ||
migrated_leader: "{{ migrated_leader.stdout }}" | ||
#delegate_to: | ||
#when: hostvars[groups[frontend_group | d('frontend')][1]]['ansible_host'] == ansible_host | ||
run_once: true | ||
|
||
- name: Save a new Leader node | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's use leader with downcase for consistency |
||
set_fact: | ||
verification_result: "{{ verification_result | combine({'A new FE leader node after the failover': migrated_leader }) }}" | ||
|
||
- name: Check that a leader indeed new | ||
debug: | ||
msg: "Original leader: {{ initial_leader }} a new leader is: {{ migrated_leader }}" | ||
when: initial_leader != migrated_leader | ||
run_once: true | ||
|
||
# Start OpenNebula service on the original leader | ||
- name: Recover stopped leader | ||
systemd: | ||
name: opennebula | ||
state: started | ||
delegate_to: "{{ initial_leader }}" | ||
run_once: true | ||
|
||
- name: Render results template | ||
delegate_to: localhost | ||
become: false | ||
vars: | ||
date: "{{ '%Y-%m-%d %H:%M:%S' | strftime(ansible_date_time.epoch) }}" | ||
template: | ||
src: report_template.j2 | ||
dest: /tmp/fe_ha_report.html | ||
ignore_errors: True | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
<html> | ||
<head> | ||
|
||
|
||
</head> | ||
<title> Cloud verification report. Executed on {{ date }} </title> | ||
|
||
<body> | ||
<style> | ||
.table_component { | ||
overflow: auto; | ||
width: 100%; | ||
} | ||
|
||
.table_component table { | ||
border: 1px solid #dededf; | ||
width: 100%; | ||
table-layout: fixed; | ||
border-collapse: collapse; | ||
border-spacing: 1px; | ||
text-align: left; | ||
} | ||
|
||
.table_component caption { | ||
caption-side: top; | ||
text-align: left; | ||
} | ||
|
||
.table_component th { | ||
border: 1px solid #dededf; | ||
background-color: #eceff1; | ||
color: #000000; | ||
padding: 5px; | ||
} | ||
|
||
.table_component td { | ||
border: 1px solid #dededf; | ||
background-color: #ffffff; | ||
color: #000000; | ||
padding: 5px; | ||
} | ||
</style> | ||
|
||
<div class="table_component" role="region" tabindex="0"> | ||
|
||
<!-- Table with the individual tests --> | ||
<table> | ||
<tr><th> HA validation scenario name</th><th>Test result status</th></tr> | ||
{% for k, v in verification_result.items() %} | ||
<tr><td> {{ k }}</td> | ||
<td>{{ v }}</td></tr> | ||
{% endfor %} | ||
</table> | ||
</div> | ||
<br><br><br> | ||
|
||
</body> | ||
</html> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct structure to follow the guidelines would be validation.run_fe_ha
and validation.fe_ha.<>