NodeNotReady

This document provides a standard operating procedure (SOP) to identify, diagnose, and resolve NodeNotReady issues in E2E Kubernetes clusters.

Overview

In Kubernetes, a node is marked NotReady when the control plane stops receiving healthy status signals from the node.

When a node goes into NodeNotReady state:

Pods stop scheduling on that node
Existing pods may get evicted
Cluster capacity reduces silently (very dangerous in production)

Why It Breaks (Common Causes)

From real production incidents, 90% of the time it's due to:

kubelet is down or stuck
Node lost network connectivity (to control plane)
DiskPressure / MemoryPressure
Container runtime failure (Docker / containerd)
Node clock skew (NTP issues)

Step 1: Identify Affected Nodes

kubectl get nodes

Step 2: Describe the Node (Primary Diagnostic)

kubectl describe node <node-name>

Focus on Conditions:

Condition	Meaning
KubeletNotReady	kubelet unhealthy or not reporting
NetworkUnavailable	CNI / network connectivity issue
DiskPressure	Disk space exhausted
MemoryPressure	Node out of memory
PIDPressure	Process limit reached

Common Causes & Resolutions

1. DiskPressure (Disk Space Exhaustion)

How It Happens

Excessive application logs
No log rotation
emptyDir volumes consuming space
Image cache growth

How to Identify

kubectl describe node <node-name>

Look for:

DiskPressure=True

Check ephemeral-storage under Capacity / Allocatable.

2. kubelet Not Running or Stuck

Symptoms

Ready=False
KubeletNotReady

Identification

kubectl describe node <node-name>

Resolution

In E2E-managed clusters, restart is handled by the platform
If persistent, raise a node-level support request

3. Network Connectivity Loss

Symptoms

NetworkUnavailable=True
Multiple nodes NotReady simultaneously

Common Causes

Firewall or security group change
Control plane connectivity loss
CNI failure

Resolution

Verify control-plane reachability
Roll back recent network changes
Escalate to network team if needed

4. MemoryPressure

Symptoms

Pods evicted
Node marked NotReady

Identification

kubectl describe node <node-name>

Look for:

MemoryPressure=True

Resolution

Scale down memory-heavy workloads
Fix memory leaks
Add memory requests and limits

5. Container Runtime Failure

Symptoms

Pods failing across the node
Runtime errors in events

Common Causes

containerd crash
Image filesystem corruption

Resolution

Platform-managed restart
Node replacement if persistent

Step 3: Check Events (Evidence)

kubectl get events -A --sort-by=.lastTimestamp

Useful events to look for:

NodeHasDiskPressure
EvictionThresholdMet
NodeNotReady

Step 4: Recovery Validation

kubectl get nodes

Expected:

<node-name>   Ready

Common Mistakes

Restarting kubelet blindly
Ignoring DiskPressure
Waiting for SSH access
Treating NodeNotReady as the root issue

E2E Best Practices

Monitor node conditions continuously
Set resource requests & limits
Configure log rotation at app level
Use centralized logging
Alert on NodeNotReady and DiskPressure

Final Note

NodeNotReady is a signal, not a failure. Fix the cause, and the node will recover automatically.

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on May 15, 2026.

Overview​

Why It Breaks (Common Causes)​

Step 1: Identify Affected Nodes​

Step 2: Describe the Node (Primary Diagnostic)​

Common Causes & Resolutions​

1. DiskPressure (Disk Space Exhaustion)​

How It Happens​

How to Identify​

2. kubelet Not Running or Stuck​

Symptoms​

Identification​

Resolution​

3. Network Connectivity Loss​

Symptoms​

Common Causes​

Resolution​

4. MemoryPressure​

Symptoms​

Identification​

Resolution​

5. Container Runtime Failure​

Symptoms​

Common Causes​

Resolution​

Step 3: Check Events (Evidence)​

Step 4: Recovery Validation​

Common Mistakes​

E2E Best Practices​

Final Note​

Overview

Why It Breaks (Common Causes)

Step 1: Identify Affected Nodes

Step 2: Describe the Node (Primary Diagnostic)

Common Causes & Resolutions

1. DiskPressure (Disk Space Exhaustion)

How It Happens

How to Identify

2. kubelet Not Running or Stuck

Symptoms

Identification

Resolution

3. Network Connectivity Loss

Symptoms

Common Causes

Resolution

4. MemoryPressure

Symptoms

Identification

Resolution

5. Container Runtime Failure

Symptoms

Common Causes

Resolution

Step 3: Check Events (Evidence)

Step 4: Recovery Validation

Common Mistakes

E2E Best Practices

Final Note