Skip to main content

Troubleshooting

Why can't I SSH into my worker node?

What’s Happening

You can't access your worker node by using an SSH connection.

Why It’s Happening

SSH by password is unavailable on the worker nodes.

How to Fix It

To run actions on every worker node, use a Kubernetes DaemonSet, or use jobs for one-time actions.

To get host access to worker nodes for debugging and troubleshooting purposes, review the following options.

Debugging by Using kubectl debug

Use the kubectl debug node command to deploy a pod with a privileged securityContext to a worker node that you want to troubleshoot.

First, please set up the latest kubectl tool which supports debug (we are using kubectl v1.25 for debug).

To install kubectl, follow this link and to download the kubeconfig.yaml file and connect, kindly click here for more details.

The debug pod is deployed with an interactive shell so that you can access the worker node immediately after the pod is created. For more information about how the kubectl debug node command works, see the debug command in the Kubernetes reference.

  1. Get the name of the worker node that you want to access. The worker node name is its private IP address.

$ kubectl get nodes -o wide
  1. whichever node is required to be accessed, please run the following command

$ kubectl debug node/<NODE_NAME> --image=docker.io/library/alpine:latest -it

For Example


$ kubectl debug node/onekube-ip-10-5-12-6 --image=docker.io/library/alpine:latest -it

Here in example, onekube-ip-10-5-12-6 is the master node

  1. Run debug commands to help you gather information and troubleshoot issues. Commands that you might use to debug, such as ip, ifconfig, nc, ping, and ps, are already available in the shell. You can also install other tools, such as mtr, tcpdump, and curl, by running:

    apk add <tool>
Note

Almost all basic commands are there on the alpine container. You can also use some other container or even a full-fledged distro like ubuntu depending on your requirement.

Debugging by using kubect exec

If you are unable to use the kubectl debug node command, you can create an Alpine pod with a privileged securityContext and use the kubectl exec command to run debug commands from the pod's interactive shell.

  1. Get the name of the worker node that you want to access. The worker node name is its private IP address.

$ kubectl get nodes -o wide
  1. Export the name in an environment variable.

$ export NODE=<NODE_NAME>
  1. Create a debug pod on the worker node. The Docker alpine image here is used as an example. If the worker node dosen't have public network access, you can maintain a copy of the image for debugging in your own ICR repository or build a customized image with other tools to fit your needs.

$ kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: debug-${NODE}
namespace: default
spec:
containers:
- args: ["-c", "apk add tcpdump mtr curl; sleep 1d"]
command: ["/bin/sh"]
image: icr.io/armada-master/alpine:latest
imagePullPolicy: IfNotPresent
name: debug
resources: {}
securityContext:
  1. Log in to the debug pod. The pod's interactive shell is automatically opened. If the kubectl exec command fails, continue to option 3.

$ kubectl exec -it debug-${NODE} -- sh

You can use the kubectl cp command to get logs or other files from a worker node. The following example gets the /var/log/syslog file.


$ kubectl cp default/debug-${NODE}:/host/var/log/syslog ./syslog

Get the following logs to look for issues on the worker node.


/var/log/syslog
/var/log/containerd.log
/var/log/kubelet.log
/var/log/kern.log
  1. Run debug commands to help you gather information and troubleshoot issues. Commands that you might use to debug, such as ip, ifconfig, nc, ping, and ps, are already available in the shell. You can also install other tools, such as dig, tcpdump, mtr, and curl, by running:

    apk add <tool>
    apk add bind-tools
Note

Before you can use the tcpdump command, you must first move the binary to a new location that does not conflict with the install path for the binary on the host. You can use the following command to relocate the binary: mv /usr/sbin/tcpdump /usr/local/bin/.

  1. Delete the host access pod that you created for debugging.

$ kubectl delete pod debug-${NODE}