0

What is the best practice for monitoring the system, should the CPU alerts be based on the regular CPU usage or load average? I'm wondering what approach is being used in big cloud environments.

1 Answer 1

2

Reaching 100% CPU utilisation is not what should trigger an alert, remaining at 100% CPU utilisation might be something to worry about.

Sharp spikes are usually good

Load fluctuates but your system does not reach the limit of available resources and doesn't experience continuous resource starvation.

When the CPU load spikes sometimes reach 100% your system is correctly sized, when they never reach 100% your system might be (somewhat) oversized.

Nothing to worry about.

A flat line is usually bad

When your CPU load remains at 100% CPU utilisation for a long time, your system does not have all the resources it needs.

You may need to scale up, or scale out more. Intervention and sending a pager alert might be appropriate.

On other end of the spectrum, when your CPU load remains at 0% CPU utilisation consistently, either your system may be terribly oversized and you might want to downsize, or something else is wrong (and missed by your monitoring). You probably don't want a pager alert after hours but should still follow up during business hours if that is a long term trend.

Liberated from https://www.searchstax.com/docs/hc/cpu-usage/

1
  • Great answer. Only thing I would add is to add alerts for what is important to your application. If response time is important, alert on that regardless of CPU utilization. In our application high CPU utilization results in increased response times. Alerting on response time usually triggers before the CPU alerts.
    – Tim P
    Commented Dec 11, 2023 at 17:55

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .