Re: [PLUG] LOAD AVERAGES & NAGIOS

Andy,

Will I need separate checks in Nagios since not all servers are running the same workloads? The system I'm looking at has one Nagios load check for all of the servers. I believe that is how it is setup. So, I may need to create a nagios check for each system in order for the checks to be somewhat accurate. Right now a lot of the alerts are not serious. I worry that if I adjust the one rule to quiet those alters something else may get missed. I don't know that much about Nagios, so as I'm writing this, I'm realizing this may be more of a Nagios question. Do I make 50 some rules rather than one for all.

--------------------------------

Ron Guilmet

ronald.guilmet@phillydatasolutions.com

Cloud Solutions Architect

www.phillydatasolutions.com

---- On Tue, 26 Sep 2017 16:00:07 -0700 Andy Wojnarek <andy.wojnarek@theatsgroup.com> wrote ----

Hey Ron,

What kind of workload is this? Rule of thumb is that your load average might be appropriate based on the type of workload you’re running. If response time is most important, you may want it no higher than 1 to 2 times the number of CPUs allocated to the host. If Job throughput is important, 3 to 4 times the number of CPUs may be sufficient to keep the processors busy – especially with hyper-threading.

Baseline your load average with some other type of metric at the application level (http response time for example) to determine what is appropriate.

So to answer your question, what number to alert on may be different for the different types of workloads you’re looking to monitor.

--
Andy Wojnarek

From: plug <plug-bounces@lists.phillylinux.org> on behalf of Ron Guilmet <ronald.guilmet@phillydatasolutions.com>
Reply-To: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Date: Tuesday, September 26, 2017 at 3:01 PM
To: plug <plug@lists.phillylinux.org>
Subject: [PLUG] LOAD AVERAGES & NAGIOS

I had some questions on load averages and Nagios, and the documentations is not very intuitive. My understanding of load averages is that they are in relation to the amount of CPUs a server has. For example, a load of 1.5 on a single core server means processes are beginning to queue up which can lead to slow performance, while a 1.5 load average on a 2 core machine processes are not queuing up.

So with that understanding, I need specific Nagios checks for different systems, correct?
For example, this check may alter me of a problem with one server, but other servers may go down without notice, right?

Load:
Command: "/usr/lib/nagios/plugins/check_load -w 6,5,4 -c 10,8,6"
Type: NRPE

Thanks Ron

----------------------
Ron Guilmet
ronald.guilmet@phillydatasolutions.com
Cloud Solutions Architect
www.phillydatasolutions.com

___________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug