Troubleshooting High Processor Load on Your Node
A consistently high processor load slows the node down, makes hosted applications unresponsive, and — when severe — can crash the node entirely. Sometimes high load is good news (genuine traffic growth that warrants a bigger plan); sometimes it points to a misconfigured service or runaway process. This guide explains how to identify which case applies.
Impacts of High CPU Usage
When CPU load goes up, the node and everything hosted on it slows down. When it stays high, services start to error out and the node can crash. To get ahead of this, configure CPU Load and CPU Utilization alerts on every production node — you will receive an email when load crosses a threshold and can act before the node becomes unusable.
Causes of High CPU Utilization
Apart from genuine traffic growth, the most common causes are:
Incorrect Service-Level Configuration
Services need to be tuned for the environment they run in. A web server set to spawn far more worker processes than the node has CPUs will saturate the CPU on its own. Multiple processes of the same service running simultaneously, untuned thread pools, and verbose logging on hot paths are typical culprits.
Inadequate Server Resources
Resource allocation should be proportional to expected workload. A 2 vCPU node hosting a database, an application server, and a queue worker will be CPU-bound under normal load. Upgrade the plan or split the workload across multiple nodes.
Poorly Configured Applications
Backup jobs that run during peak traffic, cron jobs that fire simultaneously instead of being staggered, and long-running batch operations on a transactional database all create periodic CPU spikes. Move heavy jobs to off-peak windows and stagger cron schedules.
How to Diagnose High CPU on the Node
uptime — Quick System Load Overview
uptime
uptime shows how long the system has been running, the number of logged-in users, and the load averages over the last 1, 5, and 15 minutes.
Read the three numbers left to right (1 min, 5 min, 15 min):
load average over the last 1 minute is 0.18
load average over the last 5 minutes is 0.07
load average over the last 15 minutes is 0.02
If the load averages are consistently higher than the number of CPUs, the node is overloaded — too many processes are waiting for CPU time. Rising values (1-min higher than 15-min) indicate load is increasing; falling values indicate it is recovering.
top — Live Process Listing
top
top shows running processes sorted by CPU (or memory). The top entries are the processes responsible for the load.
Press M to sort by memory or P to sort by CPU. Use q to exit.
ps — Snapshot a Specific Process
ps reports a snapshot of currently running processes. To find a specific service, pipe through grep:
ps -ef | grep mysql
The output lists every matching process along with its PID — useful when you need to kill a runaway worker but leave the parent service running.
How to Stop a Runaway Process
Graceful Stop with pkill
pkill processname
For example:
pkill mysql
pkill sends SIGTERM by default and gives the process a chance to clean up.
Force Kill by PID
If the process does not respond to SIGTERM:
kill -9 pid
-9 sends SIGKILL, which the kernel cannot ignore. Use this only when graceful termination has failed — services killed this way may leave temporary files or stale locks behind.
After You Have Mitigated the Spike
Once the immediate load is under control:
- Re-check the alert configuration to be sure you will be notified next time.
- Review the application logs around the spike — application bugs or DoS-style traffic patterns often leave a trail.
- If load is consistently high under normal traffic, upgrade the node plan.