# CrashLoopBackOff

## 1. Introduction

If you run applications on Kubernetes in production, you will eventually see this status:

**CrashLoopBackOff**

Many engineers assume this is a Kubernetes failure. It is not.

`CrashLoopBackOff` means Kubernetes is behaving correctly, and your container cannot stay alive.

This document explains:

- What CrashLoopBackOff really means
- Why it happens in production
- How to **identify the root cause**
- How to **fix it correctly**

---

## 2. What CrashLoopBackOff Actually Means

CrashLoopBackOff is **not a crash**. It is a **restart pattern**.

### What Happens Internally

1. Container starts
2. Container exits unexpectedly
3. Kubernetes restarts the container
4. Restart delay increases (backoff up to ~5 minutes)
5. Kubernetes protects the cluster from endless restarts

> Kubernetes is healthy. Your workload is broken.

---

## 3. Start Simple (Never Guess)

```bash
kubectl get pods
```

Look for:

- **STATUS: CrashLoopBackOff**
- Rapidly increasing **RESTARTS**

For all namespaces:

```bash
kubectl get pods -A | grep CrashLoopBackOff
```

High restart count means fast failure. Do not restart blindly.

---

## 4. Identify the Broken Pod (Most Important Step)

```bash
kubectl describe pod <pod-name> -n <namespace>
```

Focus on **Events**.

### Common Event Clues

| Event Message | Meaning |
|---|---|
| OOMKilled | Memory limit exceeded |
| Liveness probe failed | Kubernetes killed the pod |
| Back-off restarting failed container | Repeated crash |
| Permission denied | File / user issue |
| Secret not found | Missing configuration |

If you don't read events, you are debugging blind.

---

## 5. Logs: Current & Previous (Mandatory)

Many engineers miss the most important command:

```bash
kubectl logs <pod-name> --previous
```

Use both:

```bash
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous
```

For multi-container pods:

```bash
kubectl logs <pod-name> -c <container-name> -n <namespace>
```

Previous logs often contain the **real failure**.

---

## Hands-On Examples

The following examples are **for demonstration and learning purposes only**. They are intentionally designed to show common reasons why a pod enters CrashLoopBackOff, such as:

- Application startup failures
- Incorrect health probes
- Insufficient resource limits
- Missing configuration or secrets

These examples deliberately introduce failures to help you understand how Kubernetes behaves when an application cannot start or stay healthy.

---

### Example 1: Container That Always Exits

**Broken YAML (CrashLoopBackOff)**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: crashloop
spec:
  replicas: 1
  selector:
    matchLabels:
      app: crashloop
  template:
    metadata:
      labels:
        app: crashloop
    spec:
      containers:
      - name: app
        image: busybox
        command: ["sh", "-c", "echo App started; exit 1"]
```

**Apply:**

```bash
kubectl apply -f crashloop-exit.yaml
kubectl get pods
```

**Result:** STATUS shows `CrashLoopBackOff` with rapidly increasing RESTARTS.

---

### Example 2: Liveness Probe Killing a Healthy App

**Broken YAML (Probe Misconfiguration)**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: crashloop-liveness-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: liveness-demo
  template:
    metadata:
      labels:
        app: liveness-demo
    spec:
      containers:
      - name: app
        image: nginx
        livenessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 1
          periodSeconds: 5
```

#### Why This Causes CrashLoopBackOff

**1. Invalid Health Check Path**

NGINX does not expose a `/health` endpoint by default. As a result:

- The liveness probe receives a non-200 response
- Kubernetes assumes the container is unhealthy

**2. Probe Starts Too Early**

`initialDelaySeconds: 1` means Kubernetes starts health checks 1 second after container startup. The application may not be fully ready yet — even a healthy container can fail at this stage.

**3. Kubernetes Forcefully Restarts the Container**

When the liveness probe fails:

- Kubernetes kills the container
- The container is restarted
- The same probe fails again
- This loop repeats

After several failures, Kubernetes applies a restart backoff and the pod enters `CrashLoopBackOff`.

---

### Example 3: Out-of-Memory (OOMKilled)

**Broken YAML**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: crashloop-oom-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: oom-demo
  template:
    metadata:
      labels:
        app: oom-demo
    spec:
      containers:
      - name: app
        image: polinux/stress
        command: ["stress"]
        args: ["--vm", "1", "--vm-bytes", "200M", "--vm-hang", "1"]
        resources:
          limits:
            memory: "64Mi"
```

**Result:**

- Container exceeds memory
- Kernel kills it
- Pod enters CrashLoopBackOff

#### Why CrashLoopBackOff Happens in This Case

Your container:

- Is limited to **64Mi memory**
- Tries to allocate **200MB**
- Linux kernel kills it to protect the node
- Kubernetes restarts it
- Same thing happens again

After several restarts: `STATUS: CrashLoopBackOff`

> Kubernetes is working correctly. Memory limits are enforced. This is expected behavior.

---

### Example 4: CrashLoopBackOff Due to Missing Secret or ConfigMap

This section demonstrates how a pod can enter CrashLoopBackOff when required configuration objects (Secret or ConfigMap) are missing.

#### Case 1: Missing Secret

The application expects a Secret at startup. Since the Secret does not exist, the container fails immediately, and Kubernetes repeatedly restarts it.

**Deployment YAML (Missing Secret)**

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql-crashloop-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql-demo
  template:
    metadata:
      labels:
        app: mysql-demo
    spec:
      containers:
      - name: app
        image: busybox
        command:
          - sh
          - -c
          - |
            echo "Starting app"
            if [ -z "$MYSQL_PASSWORD" ]; then
              echo "MySQL password missing"
              exit 1
            fi
            echo "Connected to MySQL"
        env:
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_PASSWORD
```

> **Note:** The secret `mysql-secret` does not exist.

**What happens:**

1. Pod starts
2. Script runs
3. Password is empty
4. App exits
5. Kubernetes restarts
6. Status becomes `CrashLoopBackOff`

#### Case 2: Missing ConfigMap

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongo-crashloop-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongo-demo
  template:
    metadata:
      labels:
        app: mongo-demo
    spec:
      containers:
      - name: app
        image: busybox
        command:
          - sh
          - -c
          - |
            echo "Starting app"
            if [ -z "$MONGO_URL" ]; then
              echo "MongoDB URL missing"
              exit 1
            fi
            echo "Connected to MongoDB"
        env:
        - name: MONGO_URL
          valueFrom:
            configMapKeyRef:
              name: mongo-config
              key: MONGO_URL
```

**What Happens:**

1. Pod is scheduled
2. Container starts
3. Kubernetes tries to mount the ConfigMap
4. ConfigMap is missing
5. Container fails to start
6. Kubernetes restarts the container
7. Pod enters `CrashLoopBackOff`

> CrashLoopBackOff can occur when an application depends on a Secret or ConfigMap that does not exist. Kubernetes retries starting the container, but since the required configuration is missing, the container fails repeatedly.

---

## Freeze Pod for Live Debugging

When the pod crashes too fast, override the command to keep it alive:

**Debug Override**

```yaml
command: ["/bin/sh"]
args: ["-c", "while true; do sleep 3600; done"]
```

**Exec Into Pod**

```bash
kubectl exec -it <pod-name> -n <namespace> -- sh
```

You can now:

- Inspect environment variables
- Test DB connectivity
- Run startup commands manually
- Check file permissions

---

## 6. Fix → Verify → Rollback

**Verify**

```bash
kubectl rollout status deployment/<deployment-name>
```

**Rollback**

```bash
kubectl rollout undo deployment/<deployment-name>
```

Stability first. Debug second.

---

## 7. Final Truth

CrashLoopBackOff is not a Kubernetes problem. It is almost always caused by:

- Bad configuration
- Bad probes
- Bad resource limits
- Bad assumptions

Kubernetes does not break applications. It **exposes mistakes early and aggressively**.


---