OOMKilled
This document explains the OOMKilled error in Kubernetes - how to identify, confirm, troubleshoot, and prevent it in production environments.
1. What Is OOMKilled?
OOMKilled occurs when a container uses more memory than its configured limit and is forcibly terminated by the Linux kernel.
- Kubernetes does not kill the container
- The kernel kills it to protect the node
- Kubernetes only reports what happened
2. What OOMKilled Does NOT Mean
OOMKilled does not mean:
- Kubernetes crashed
- The application exited normally
- There was a code exception
It strictly means: Memory limit was exceeded
3. Where OOMKilled Happens (Flow)
- Pod starts normally
- Memory usage increases
- Memory limit is crossed
- Linux kernel kills the process
- Container exits with code
137 - Pod shows
OOMKilled
4. Common Production Causes
| Cause | Description |
|---|---|
| Memory leak | Application never releases memory |
| Low memory limits | Limits set lower than real usage |
| Incorrect requests | Requests much lower than actual usage |
| JVM / Node / Python | No heap or memory tuning |
| Traffic spikes | Sudden increase in load |
5. How to Confirm OOMKilled (Always Verify)
Step 1: Describe the Pod
kubectl describe pod <pod-name>
Look for:
Reason: OOMKilled
Exit Code: 137
This confirms OOMKilled.
Step 2: Check Pod Status
kubectl get pod <pod-name>
Status may show: CrashLoopBackOff
6. Understand Requests vs Limits (Critical)
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
- Requests → used for scheduling
- Limits → enforced by the kernel
Exceed the limit → container is killed
7. Reproduce OOMKilled
Step 1: Create a Pod with Low Memory Limit
apiVersion: apps/v1
kind: Deployment
metadata:
name: oom-example
labels:
app: oom
spec:
replicas: 1
selector:
matchLabels:
app: outofmemory
template:
metadata:
labels:
app: outofmemory
spec:
containers:
- name: outofmemory
image: polinux/stress
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "200M", "--vm-hang", "1"]
resources:
limits:
memory: "64Mi"
The container tries to allocate 200 MB but is capped at a 64 Mi limit, so the kernel terminates it.
Apply:
kubectl apply -f oom-test.yaml
Step 2: Observe Pod Behavior
kubectl get pods
The pod restarts repeatedly and settles into CrashLoopBackOff.
Step 3: Confirm OOMKilled
kubectl describe pod <pod-name>
Look at the Last State block:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Exit Code 137 is the reliable signal of a memory kill (128 + signal 9 / SIGKILL). The Reason field shows OOMKilled when the container's main process is the one the kernel kills. If the kernel kills a child process instead (as some workloads, including this stress example, can trigger), the Reason may instead read Error with the same exit code 137. In both cases the container was terminated for exceeding its memory limit.
8. How to Troubleshoot
Step 1: Check Real Memory Usage
kubectl top pod <pod-name>
Compare:
- Actual usage
- Memory request
- Memory limit
Step 2: Identify the Offending Pod
- Look for pods near memory limit
- Check restart count
- Review application logs
Step 3: Fix the Root Cause
| Scenario | Fix |
|---|---|
| Memory leak | Fix application |
| Low limit | Increase memory limit |
| Bursty usage | Increase limit + headroom |
| JVM apps | Set heap size |
| High load | Scale replicas |
9. What NOT to Do
- Disable memory limits
- Ignore OOMKilled events
- Set very low requests
- Remove node pressure protections
These lead to node crashes.
Golden Rules
-
Memory requests must reflect real application usage, not assumptions or best-case estimates. Incorrect requests lead to poor scheduling and instability.
-
Memory limits must always include headroom to handle traffic spikes, GC cycles, and temporary memory bursts. Limits set too tight will inevitably cause OOMKilled.
-
Memory-intensive applications must be explicitly tuned (Java, Node.js, Python). Default runtime settings are not container-aware and often exceed Kubernetes memory limits.
10. Prevention Best Practices
- Always set memory requests & limits
- Monitor pod memory usage
- Alert on OOMKilled events
- Test workloads under load
- Tune language runtimes
11. Final Statement
OOMKilled is not a Kubernetes bug. It is Kubernetes and Linux protecting node stability.
If memory limits are ignored, the kernel enforces them.