Deploy in under 60 minutes an AIOPs on Linux cluster.
- Terraform - Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.
- vSphere account - Access to vSphere with the proper authorization to create VMs
You will need an IBM entitlement key to install AIOps. This can be obtained here.
If you are an IBMer or Business Parter, you can request access to vSphere through IBM TechZone.
VMware on IBM Cloud Environments
Select Request vCenter access (OCP Gym)
To use this Terraform code to deploy virtual machines on vSphere, you first need a VM template. Here's how to create one using a RHEL 9 image.
You want to deploy VMs using Terraform, but Terraform needs a pre-existing VM template to clone from.
Red Hat provides a tool to generate OVA files for RHEL 9. This is a convenient way to create a VM image that can be imported into vSphere.
π You can find the image builder on the Red Hat Customer Portal.
Once you have the OVA file:
- Open vSphere Client.
- Go to Deploy OVF Template.
- Upload the RHEL 9 OVA.
- Follow the wizard to deploy it as a VM or template.
π‘ Tip: If you're connecting to vSphere through a WireGuard VPN, you might experience timeouts or connectivity issues.
In such cases, consider running your Terraform commands from a bastion host that resides within the same network or environment as vSphere.
This can help avoid VPN-related latency or firewall restrictions that interfere with the connection.
To install Terraform from a RHEL 8 bastion host, follow these steps:
Open a terminal and run:
sudo dnf install -y yum-utils git bind-utils
Create a new repo file:
sudo tee /etc/yum.repos.d/hashicorp.repo <<EOF
[hashicorp]
name=HashiCorp Stable - RHEL 8
baseurl=https://rpm.releases.hashicorp.com/RHEL/8/\$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://rpm.releases.hashicorp.com/gpg
EOF
Now install Terraform:
sudo dnf install -y terraform
Check the installed version:
terraform -version
π‘ Important: Currently this Terraform module assumes that the network is 192.168.252.0/24. This is hard-coded into the module, sorry.
There are 4 static IP addresses that are needed.
Type | Hostname | IP | FQDN |
---|---|---|---|
haproxy |
haproxy |
192.168.252.9 |
haproxy.gym.lan |
k3s server |
k3s-server-0 |
192.168.252.10 |
k3s-server-0.gym.lan |
k3s server |
k3s-server-1 |
192.168.252.11 |
k3s-server-1.gym.lan |
k3s server |
k3s-server-2 |
192.168.252.12 |
k3s-server-2.gym.lan |
The example table above assumes the base_domain
is set to gym.lan
- Log in to pfSense via the web UI (usually at
https://192.168.252.1
). - Navigate to:
Services β DNS Forwarder. - Scroll down to Host Overrides.
- For each device:
- Click Add.
- Set the IP address (from the table above).
- Set the Hostname (e.g.,
haproxy
). - Set the Domain to
gym.lan
(or appropriate base domain) to form the FQDN. - Click Save.
- Click Apply Changes at the top of the page.
To ensure the FQDNs resolve correctly:
- Test resolution using:
nslookup haproxy.gym.lan
To ensure that your static DHCP mappings (like k3s-agent-0.gym.lan
, etc.) are resolvable via DNS, you need to enable a specific setting in pfSense:
- Log in to the pfSense Web UI.
- Navigate to:
Services β DNS Forwarder. - Scroll down to the General DNS Forwarder Options section.
- Check the box for: Register DHCP static mappings in DNS forwarder
- Click Save and then Apply Changes.
π‘ This setting controls whether hostnames assigned to static DHCP clients are automatically added to the DNS forwarder or resolver so they can be resolved locally.
Clone this repository to your local workstation. This will allow you to configure and run terraform.
Open a terminal and run:
sudo dnf install -y git bind-utils
Now clone this repo:
git clone <repo>
If you want to do an offline installation, you can configure a private registry using Artifactory and follow the product instructions for mirroring the images.
There is a file called terraform.tfvars.example
. Copy this file to terraform.tfvars
and set variables here according to
your needs.
We are now ready to deploy our infrastructure. First we ask terraform to plan the execution with:
terraform plan
If everything is ok the output should be something like this:
...skip
Plan: 14 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ aiops_etc_hosts = (known after apply)
+ haproxy_ip_address = (known after apply)
+ vm_ip_addresses = [
+ (known after apply),
+ (known after apply),
+ (known after apply),
]
now we can deploy our resources with:
terraform apply
Sample output:
...skip
Plan: 14 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ aiops_etc_hosts = (known after apply)
+ haproxy_ip_address = (known after apply)
+ vm_ip_addresses = [
+ (known after apply),
+ (known after apply),
+ (known after apply),
]
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
...skip
Apply complete! Resources: 14 added, 0 changed, 0 destroyed.
Outputs:
aiops_etc_hosts = <<EOT
192.168.252.9 aiops-cpd.haproxy.gym.lan
192.168.252.9 cp-console-aiops.haproxy.gym.lan
EOT
haproxy_ip_address = "192.168.252.9"
vm_ip_addresses = [
"192.168.252.10",
"192.168.252.11",
"192.168.252.12",
]
It takes about 5 minutes for the actual installation to start. You can ssh to any of the control plan nodes found in the output of vm_ip_addresses
using clouduser
. The following command opens an SSH session with k3s-server-0
.
sed -i '/^k3s-/d' ~/.ssh/known_hosts && ssh -o StrictHostKeyChecking=no -i ./id_rsa clouduser@k3s-server-0
π‘ Tip: The default password for clouduser is
mypassword
Change to the root user on the control plane node.
sudo su -
Run the aiopsctl
command to see the installation status.
aiopsctl status
Sample output:
o- [03 Jun 25 14:58 EDT] Getting cluster status
Control Plane Node(s):
k3s-server-0.gym.lan Ready
k3s-server-1.gym.lan Ready
k3s-server-2.gym.lan Ready
Worker Node(s):
k3s-agent-0.gym.lan Ready
k3s-agent-1.gym.lan Ready
k3s-agent-2.gym.lan Ready
k3s-agent-3.gym.lan Ready
k3s-agent-4.gym.lan Ready
k3s-agent-5.gym.lan Ready
o- [03 Jun 25 14:58 EDT] Checking AIOps installation status
15 Unready Components
aiopsui
asm
issueresolutioncore
baseui
cluster
aiopsedge
zenservice
aimanager
commonservice
aiopsanalyticsorchestrator
kafka
lifecycletrigger
lifecycleservice
elasticsearchcluster
rediscp
[WARN] AIOps installation unhealthy
The install can take up to 45 minutes to complete.
All commands below should be run as root from a control plane node.
List the nodes:
kubectl get nodes
Sample output:
NAME STATUS ROLES AGE VERSION
k3s-agent-0.gym.lan Ready worker 5m38s v1.31.7+k3s1
k3s-agent-1.gym.lan Ready worker 5m38s v1.31.7+k3s1
k3s-agent-2.gym.lan Ready worker 5m38s v1.31.7+k3s1
k3s-agent-3.gym.lan Ready worker 5m37s v1.31.7+k3s1
k3s-agent-4.gym.lan Ready worker 5m39s v1.31.7+k3s1
k3s-agent-5.gym.lan Ready worker 5m41s v1.31.7+k3s1
k3s-server-0.gym.lan Ready control-plane,etcd,master 5m56s v1.31.7+k3s1
k3s-server-1.gym.lan Ready control-plane,etcd,master 5m21s v1.31.7+k3s1
k3s-server-2.gym.lan Ready control-plane,etcd,master 5m10s v1.31.7+k3s1
List all pods:
kubectl get pods -A
Sample output (note that during install, some unhealthy pods are expected):
NAMESPACE NAME READY STATUS RESTARTS AGE
aiops aimanager-operator-controller-manager-6866676848-w2bbp 1/1 Running 0 7m6s
aiops aiops-entitlement-check-9vk99 0/1 Completed 0 11m
aiops aiops-ibm-elasticsearch-es-server-all-0 2/2 Running 0 7m58s
aiops aiops-ibm-elasticsearch-es-server-all-1 2/2 Running 0 7m58s
aiops aiops-ibm-elasticsearch-es-server-all-2 2/2 Running 0 7m58s
aiops aiops-installation-edb-postgres-1 1/1 Running 0 7m11s
aiops aiops-installation-edb-postgres-2 1/1 Running 0 6m28s
aiops aiops-installation-edb-postgres-3 1/1 Running 0 2m40s
...skip
List all pods that are in unhealthy state:
kubectl get pods -A | grep -vE 'Completed|([0-9]+)/\1'
Sample output (again, unhealthy pods are expected during install):
NAMESPACE NAME READY STATUS RESTARTS AGE
aiops aiops-ir-analytics-cassandra-setup-crfz7 0/1 CrashLoopBackOff 5 (43s ago) 7m2s
aiops aiops-ir-core-archiving-setup-cwl2s 0/1 Init:0/1 0 6m57s
aiops aiops-ir-lifecycle-create-policies-job-6xdkx 0/1 Init:0/2 0 5m46s
aiops aiops-ir-lifecycle-policy-registry-svc-c79f97567-lxtq8 0/1 Init:CrashLoopBackOff 5 (80s ago) 5m46s
aiops aiops-ir-lifecycle-policy-registry-svc-c79f97567-vrfzw 0/1 Init:CrashLoopBackOff 5 (91s ago) 5m46s
aiops aiops-topology-cassandra-1 0/1 Running 0 16s
aiops aiopsedge-generic-topology-integrator-5fd9b478cd-kh8xv 0/1 Init:0/1 0 5m13s
aiops aiopsedge-generic-topology-integrator-f9b677db5-lt9xp 0/1 Init:0/1 0 5m12s
aiops aiopsedge-im-topology-integrator-5bd84594b-w5q9s 0/1 Init:0/1 0 5m7s
aiops aiopsedge-im-topology-integrator-869dc6f6fc-n7st5 0/1 Init:0/1 0 5m9s
aiops aiopsedge-instana-topology-integrator-845fb497dd-5xg7z 0/1 Init:0/1 0 5m7s
aiops aiopsedge-instana-topology-integrator-8466585ffc-hwpj5 0/1 Init:0/1 0 5m3s
aiops cp4waiops-metricsprocessor-9b9864cf4-7fj2v 0/1 CreateContainerConfigError 0 7m25s
aiops usermgmt-57c56b4c4b-dsq4c 0/1 Running 0 24s
aiops usermgmt-57c56b4c4b-pf5jb 0/1 Running 0 24s
Follow the launch template script output:
tail -f /var/log/cloud-init-output.log
This can be run from any node, it will show the verbose output of the launch scripts found in this repo under cloudinit
for the appropriate node or instance type.
Once the install is complete, the aiopsctl status
command run from a control node will show the following.
o- [03 Jun 25 14:58 EDT] Getting cluster status
Control Plane Node(s):
k3s-server-0.gym.lan Ready
k3s-server-1.gym.lan Ready
k3s-server-2.gym.lan Ready
Worker Node(s):
k3s-agent-0.gym.lan Ready
k3s-agent-1.gym.lan Ready
k3s-agent-2.gym.lan Ready
k3s-agent-3.gym.lan Ready
k3s-agent-4.gym.lan Ready
k3s-agent-5.gym.lan Ready
o- [03 Jun 25 14:58 EDT] Checking AIOps installation status
15 Ready Components
aiopsui
asm
issueresolutioncore
baseui
cluster
aiopsedge
zenservice
aimanager
commonservice
aiopsanalyticsorchestrator
kafka
lifecycletrigger
lifecycleservice
elasticsearchcluster
rediscp
AIOps installation healthy
From a control node as the root user, run the following command to get the URL and login credentials.
aiopsctl server info --show-secrets
Sample output:
Cluster Access Details
URL: aiops-cpd.haproxy.gym.lan
Username: cpadmin
Password: 6oiKSZ6rStHoUW3V3oCBSen2AjVtxAhw
Store this information for future use.
In the terraform output is an /etc/hosts
mapping for the haproxy server running in vSphere.
If you need to view the terraform output again, run the following:
terraform output
Sample output:
aiops_etc_hosts = <<EOT
192.168.252.9 aiops-cpd.haproxy.gym.lan
192.168.252.9 cp-console-aiops.haproxy.gym.lan
EOT
...skip
Copy the 2 lines in the aiops_etc_hosts
output and paste to your local workstation
hosts file.
Navigate in your browser to the URL beginning with aiops-cpd
. In the example above this
would be https://aiops-cpd.haproxy.gym.lan
.
You will see warnings about self signed certificates, accept all warnings (there will be a few).
The console login page will load.
Use the credentials from the aiopsctl server info
to login. Accept any further security warnings.
Congratulations! You have successfully installed AIOps.
To destroy all resources, run the following command.
terraform destroy -auto-approve