diff --git a/src/pages/docs/infrastructure/deployment-targets/kubernetes/kubernetes-agent/troubleshooting.md b/src/pages/docs/infrastructure/deployment-targets/kubernetes/kubernetes-agent/troubleshooting.md index bc8d20ee2a..1e0c99afd8 100644 --- a/src/pages/docs/infrastructure/deployment-targets/kubernetes/kubernetes-agent/troubleshooting.md +++ b/src/pages/docs/infrastructure/deployment-targets/kubernetes/kubernetes-agent/troubleshooting.md @@ -1,7 +1,7 @@ --- layout: src/layouts/Default.astro pubDate: 2024-05-08 -modDate: 2024-05-30 +modDate: 2024-08-05 title: Troubleshooting description: How to troubleshoot common Kubernetes Agent issues navOrder: 40 @@ -44,6 +44,14 @@ If the Agent install command fails with a timeout error, it could be that: - (if using the NFS storage solution) The NFS CSI driver has not been installed - (if using a custom Storage Class) the Storage Class name doesn't match +#### 404 error when setting up the NFS Pod. + +Check if your version of help is up to date. In versions the error message you might be experiencing is [not shown]([url](https://github.com/helm/helm/blob/1ec0aacb8865d5b1f7ef1cb884bbf9b12579ecef/pkg/action/install.go#L753-L769)). + +Once you version of help is up to date, run `helm repo update` and try again. + +If you're still having issues where Helm fails to retrieve a remote chart if there are [local repos that are not cached](https://github.com/helm/helm/issues/11961) look at workarounds provided on that helm-issue page. If that doesn't help, please [get in touch](https://octopus.com/support). + ## Script Execution Issues ### `Unexpected Script Pod log line number, expected: expected-line-no, actual: actual-line-no` @@ -62,3 +70,43 @@ If you are using the default NFS storage however, then the script pod would be d - being evicted due to exceeding its storage quota - being moved or restarted as part of routine cluster operation + +## Frequently Asked Questions {#FAQ} + +### Can the agent work with Octopus running in an HA Cluster setup? +Yes! See the [Kubernetes agent HA Cluster Support](/docs/infrastructure/deployment-targets/kubernetes/kubernetes-agent/ha-cluster-support) page. + + +### Can a proxy be specified when setting up the Kubernetes agent? +Yes! Proxy servers for the polling connection that takes place between the agent and Octopus Server. These can be supplied for setup via the `.pollingProxy.*` helm values. + +Define the polling proxy server through the `agent.pollingProxy.host`, `agent.pollingProxy.port`, `agent.pollingProxy.username` and `agent.pollingProxy.password` values via the [octopusdeploy/kubernetes-agent](https://hub.docker.com/r/octopusdeploy/kubernetes-agent) helm chart. + +### When trying to install the Kubernetes Agent on an existing cluster, I get an 401: Unauthorized error. + +``` +Error: GET "https://registry-1.docker.io/v2/octopusdeploy/kubernetes-agent/tags/list": +GET "https://auth.docker.io/token?scope=repository%3Aoctopusdeploy%2Fkubernetes-agent%3Apull&service=registry.docker.io": unexpected status code 401: Unauthorized +``` +1. If you are running this command locally are you logged in? +2. If this is running from another automation, does that process have valid authentication and authorization? + +### Do I need to have the NFS CSI Driver? +Not for all configurations. It depends, the installation wizard will guide you. + +If you are using `azurefile` then you don't need the NFS CSI driver. + +If you're using a new/clean AKS instance then you will need to install the NFS CSI Driver, as that AKS instance will not have the NFS CSI driver installed. + +### I have unexpected behavior with the polling endpoints in a HA configuration. +This could be a variety of issues. First check that different PORTS and/or URLs are used for each node. + +Check what was supplied to `agent.serverCommsAddresses` as they must be unique for [each Octopus node being registered against](https://octopus.com/docs/administration/high-availability/maintain/polling-tentacles-with-ha#connecting-polling-tentacles). + +If that doesn't help, please [get in touch](https://octopus.com/support). + +### I'm having strange behavior relating to ingress in a HA configuration. +Carefully look and see that there is a `serverCommsAddress` property for backwards compatibility, and a `serverCommsAddresses` the latter supporting an array input, mistyping these can happen. This has presented itself as a variety of errors depending on the broader configuration, e.g. you may see "it failed to allocate the public ip" if using load balancers. + +### The Script Pod seems to hang during a deployment +The times this has been brought up it's been specific to the deployment process being executed. Run subsets of your process to narrow down the cause, or [get in touch](https://octopus.com/support) with info on how we can reproduce what you're seeing.