# PodUnschedulable

This document explains how to identify, analyze, and resolve **PodUnschedulable** issues in E2E Kubernetes clusters.

---

## 1. What Is PodUnschedulable?

A pod enters the **Pending** state with the reason **Unschedulable** when the Kubernetes **scheduler cannot find any node** that satisfies **all scheduling requirements** of the pod.

- Kubernetes does not partially schedule a pod
- If **even one condition fails**, the pod will not be placed

Symptoms:

- Pods stuck in `Pending` state
- New replicas not starting during deployment
- HPA not scaling pods despite high traffic
- No container crashes or application errors
- Cluster appears healthy, but workloads do not start

---

## 2. Impact on Production

| Area | Impact |
|---|---|
| Application | Traffic served only by old pods |
| Autoscaling | HPA fails silently |
| Deployments | Rollouts hang |
| Reliability | Risk of outage during traffic spikes |

---

## 3. How Kubernetes Scheduling Works

1. Scheduler evaluates **every node**
2. Checks **resources, labels, taints, affinity rules**
3. If **no node matches all requirements**
4. Pod is marked **Unschedulable**

> Kubernetes is strict by design — this is a safety mechanism.

---

## 4. Primary Causes of PodUnschedulable

A pod becomes unschedulable if **ANY ONE** of the following fails:

### 4.1 Resource Constraints

- Insufficient CPU
- Insufficient Memory

**Example — Insufficient CPU** (requesting 6 CPUs on a 4-core node):

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: "6"    # Insufficient CPU for a 4-core node
        memory: "1Gi"
```

**Example — Insufficient Memory** (requesting 8Gi on a 7.2GB node):

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        cpu: "2"
        memory: "8Gi"   # Insufficient memory for a 7.2GB node
```

Both result in `STATUS: Pending`.

---

### 4.2 Node Selector Mismatch

Pod requests labels that nodes do not have.

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  nodeSelector:
    app: myapp    # Pod runs only on nodes labeled app=myapp
  containers:
  - name: nginx
    image: nginx
```

Since none of the nodes are labeled with `app=myapp`, Kubernetes cannot find a suitable node and keeps the pod in `Pending` state.

**How to Fix:**

```bash
kubectl label node <node-name> app=myapp
```

Once the label is added, the pod will be scheduled automatically.

---

### 4.3 Taints Without Tolerations

Example taints that block scheduling:

- `node.kubernetes.io/disk-pressure: NoSchedule`
- `node.kubernetes.io/memory-pressure: NoSchedule`

If a node has a taint and the pod has no matching toleration, the pod stays `Pending`.

---

### 4.4 Affinity / Anti-Affinity Rules

- `PodAntiAffinity` blocking placement
- Hard `requiredDuringSchedulingIgnoredDuringExecution` rules that cannot be satisfied

---

### 4.5 Max Pods per Node Reached

- ENI/IP or kubelet pod limits exceeded

---

## 5. Step-by-Step Troubleshooting

### Step 1: Identify Pending Pods

```bash
kubectl get pods
```

Look for: `STATUS: Pending`

---

### Step 2: Describe the Pod (Most Important)

```bash
kubectl describe pod <pod-name>
```

Check the **Events** section. Example event output:

```
0/6 nodes are available:
6 Insufficient cpu
node(s) had taint {node.kubernetes.io/disk-pressure: NoSchedule}
```

> Events never lie. Always start here.

---

### Step 3: Check Node Capacity

```bash
kubectl describe node <node-name>
```

Verify:

- Allocatable CPU & Memory
- Current usage
- Conditions (DiskPressure, MemoryPressure)

---

### Step 4: Check Node Taints

```bash
kubectl describe node <node-name> | grep -i taint
```

Common blocking taints:

- `disk-pressure`
- `memory-pressure`
- `node-role.kubernetes.io/control-plane`

---

### Step 5: Review Pod Constraints

Inspect the pod/deployment YAML:

- `resources.requests`
- `nodeSelector`
- `affinity`
- `tolerations`

---

## 6. Resolution Actions (Fix the Constraint)

| Issue | Resolution |
|---|---|
| Insufficient CPU/Memory | Reduce requests or scale nodes |
| DiskPressure taint | Free disk space on node |
| Label mismatch | Fix node labels or pod selector |
| Missing toleration | Add toleration |
| Pod limit reached | Add nodes or increase limits |

---

## 7. Fast Triage Checklist

- Pod is `Pending`
- Describe pod events
- Check node resources
- Check taints & tolerations
- Check affinity rules
- Check max pods per node

---

## 8. Best Practices

- Avoid over-requesting resources
- Monitor node disk usage proactively
- Alert on `DiskPressure` and `MemoryPressure`
- Validate affinity rules before production rollout
- Test HPA behavior under load

---

## 9. Final Note

`PodUnschedulable` is **not a Kubernetes failure**. It is Kubernetes **protecting cluster stability**.

If a pod cannot be placed safely, Kubernetes will not place it at all.


---