# CrashLoopBackOff ## 1. Introduction If you run applications on Kubernetes in production, you will eventually see this status: **CrashLoopBackOff** Many engineers assume this is a Kubernetes failure. It is not. `CrashLoopBackOff` means Kubernetes is behaving correctly, and your container cannot stay alive. This document explains: - What CrashLoopBackOff really means - Why it happens in production - How to **identify the root cause** - How to **fix it correctly** --- ## 2. What CrashLoopBackOff Actually Means CrashLoopBackOff is **not a crash**. It is a **restart pattern**. ### What Happens Internally 1. Container starts 2. Container exits unexpectedly 3. Kubernetes restarts the container 4. Restart delay increases (backoff up to ~5 minutes) 5. Kubernetes protects the cluster from endless restarts > Kubernetes is healthy. Your workload is broken. --- ## 3. Start Simple (Never Guess) ```bash kubectl get pods ``` Look for: - **STATUS: CrashLoopBackOff** - Rapidly increasing **RESTARTS** For all namespaces: ```bash kubectl get pods -A | grep CrashLoopBackOff ``` High restart count means fast failure. Do not restart blindly. --- ## 4. Identify the Broken Pod (Most Important Step) ```bash kubectl describe pod -n ``` Focus on **Events**. ### Common Event Clues | Event Message | Meaning | |---|---| | OOMKilled | Memory limit exceeded | | Liveness probe failed | Kubernetes killed the pod | | Back-off restarting failed container | Repeated crash | | Permission denied | File / user issue | | Secret not found | Missing configuration | If you don't read events, you are debugging blind. --- ## 5. Logs: Current & Previous (Mandatory) Many engineers miss the most important command: ```bash kubectl logs --previous ``` Use both: ```bash kubectl logs -n kubectl logs -n --previous ``` For multi-container pods: ```bash kubectl logs -c -n ``` Previous logs often contain the **real failure**. --- ## Hands-On Examples The following examples are **for demonstration and learning purposes only**. They are intentionally designed to show common reasons why a pod enters CrashLoopBackOff, such as: - Application startup failures - Incorrect health probes - Insufficient resource limits - Missing configuration or secrets These examples deliberately introduce failures to help you understand how Kubernetes behaves when an application cannot start or stay healthy. --- ### Example 1: Container That Always Exits **Broken YAML (CrashLoopBackOff)** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: crashloop spec: replicas: 1 selector: matchLabels: app: crashloop template: metadata: labels: app: crashloop spec: containers: - name: app image: busybox command: ["sh", "-c", "echo App started; exit 1"] ``` **Apply:** ```bash kubectl apply -f crashloop-exit.yaml kubectl get pods ``` **Result:** STATUS shows `CrashLoopBackOff` with rapidly increasing RESTARTS. --- ### Example 2: Liveness Probe Killing a Healthy App **Broken YAML (Probe Misconfiguration)** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: crashloop-liveness-demo spec: replicas: 1 selector: matchLabels: app: liveness-demo template: metadata: labels: app: liveness-demo spec: containers: - name: app image: nginx livenessProbe: httpGet: path: /health port: 80 initialDelaySeconds: 1 periodSeconds: 5 ``` #### Why This Causes CrashLoopBackOff **1. Invalid Health Check Path** NGINX does not expose a `/health` endpoint by default. As a result: - The liveness probe receives a non-200 response - Kubernetes assumes the container is unhealthy **2. Probe Starts Too Early** `initialDelaySeconds: 1` means Kubernetes starts health checks 1 second after container startup. The application may not be fully ready yet — even a healthy container can fail at this stage. **3. Kubernetes Forcefully Restarts the Container** When the liveness probe fails: - Kubernetes kills the container - The container is restarted - The same probe fails again - This loop repeats After several failures, Kubernetes applies a restart backoff and the pod enters `CrashLoopBackOff`. --- ### Example 3: Out-of-Memory (OOMKilled) **Broken YAML** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: crashloop-oom-demo spec: replicas: 1 selector: matchLabels: app: oom-demo template: metadata: labels: app: oom-demo spec: containers: - name: app image: polinux/stress command: ["stress"] args: ["--vm", "1", "--vm-bytes", "200M", "--vm-hang", "1"] resources: limits: memory: "64Mi" ``` **Result:** - Container exceeds memory - Kernel kills it - Pod enters CrashLoopBackOff #### Why CrashLoopBackOff Happens in This Case Your container: - Is limited to **64Mi memory** - Tries to allocate **200MB** - Linux kernel kills it to protect the node - Kubernetes restarts it - Same thing happens again After several restarts: `STATUS: CrashLoopBackOff` > Kubernetes is working correctly. Memory limits are enforced. This is expected behavior. --- ### Example 4: CrashLoopBackOff Due to Missing Secret or ConfigMap This section demonstrates how a pod can enter CrashLoopBackOff when required configuration objects (Secret or ConfigMap) are missing. #### Case 1: Missing Secret The application expects a Secret at startup. Since the Secret does not exist, the container fails immediately, and Kubernetes repeatedly restarts it. **Deployment YAML (Missing Secret)** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: mysql-crashloop-demo spec: replicas: 1 selector: matchLabels: app: mysql-demo template: metadata: labels: app: mysql-demo spec: containers: - name: app image: busybox command: - sh - -c - | echo "Starting app" if [ -z "$MYSQL_PASSWORD" ]; then echo "MySQL password missing" exit 1 fi echo "Connected to MySQL" env: - name: MYSQL_PASSWORD valueFrom: secretKeyRef: name: mysql-secret key: MYSQL_PASSWORD ``` > **Note:** The secret `mysql-secret` does not exist. **What happens:** 1. Pod starts 2. Script runs 3. Password is empty 4. App exits 5. Kubernetes restarts 6. Status becomes `CrashLoopBackOff` #### Case 2: Missing ConfigMap ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: mongo-crashloop-demo spec: replicas: 1 selector: matchLabels: app: mongo-demo template: metadata: labels: app: mongo-demo spec: containers: - name: app image: busybox command: - sh - -c - | echo "Starting app" if [ -z "$MONGO_URL" ]; then echo "MongoDB URL missing" exit 1 fi echo "Connected to MongoDB" env: - name: MONGO_URL valueFrom: configMapKeyRef: name: mongo-config key: MONGO_URL ``` **What Happens:** 1. Pod is scheduled 2. Container starts 3. Kubernetes tries to mount the ConfigMap 4. ConfigMap is missing 5. Container fails to start 6. Kubernetes restarts the container 7. Pod enters `CrashLoopBackOff` > CrashLoopBackOff can occur when an application depends on a Secret or ConfigMap that does not exist. Kubernetes retries starting the container, but since the required configuration is missing, the container fails repeatedly. --- ## Freeze Pod for Live Debugging When the pod crashes too fast, override the command to keep it alive: **Debug Override** ```yaml command: ["/bin/sh"] args: ["-c", "while true; do sleep 3600; done"] ``` **Exec Into Pod** ```bash kubectl exec -it -n -- sh ``` You can now: - Inspect environment variables - Test DB connectivity - Run startup commands manually - Check file permissions --- ## 6. Fix → Verify → Rollback **Verify** ```bash kubectl rollout status deployment/ ``` **Rollback** ```bash kubectl rollout undo deployment/ ``` Stability first. Debug second. --- ## 7. Final Truth CrashLoopBackOff is not a Kubernetes problem. It is almost always caused by: - Bad configuration - Bad probes - Bad resource limits - Bad assumptions Kubernetes does not break applications. It **exposes mistakes early and aggressively**. ---