Stephen Gran on 25 Mar 2005 17:46:12 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Nagios, anyone?


Hello all,

Are there any Nagios users here?  I am having some conceptual problems
getting it set up the way I would like.  Basically, we are monitoring
about 50 machines, each with several services, spread across several
netblocks.  It runs great, and does the right thing most of the time.
However, we still have a few quirks we want to work out.

If a network goes down, we want to supress alerts for every host and
service on that netblock.  We have managed to do this successfully using
the 'parents' option, so that Nagios sees the path to a host as 

Nagios -> remote gateway -> remote machine a -> remote service a
                         \> remote machine b -> remote service b

This seems to be correct, and at least works as expected.  In this
setup, nagios doesn't check service a if machine a is unreachable, and
doesn't check machine a and b or service a and b if the gateway is down.
However, the other day we had a problem where bind blew up on the Nagios
server, making all machines unresolvable, and hence unreachable.  We got
a million alerts (once bind was restarted and email started working again,
of course :)

So, now I came up with the idea of using a 'canary in the coal mine' kind
of test before checking other hosts.  The nagios server is a multi-homed
host, and several of the IP's it serves have hostnames not in /etc/hosts.
These are perfect candidates to test DNS failure - they will never be
down if the nagios machine is up, but resolving the hostname requires
bind to be running.

So, we picked one, and now we are trying to figure out how to make
Nagios abort other checks if this one fails.  We tried this with the
'dependant_host' option (IIRC), and then stopped bind, but all services
and hosts quickly went into alert status, rather than just the one.  So
that first guess was wrong, clearly :)

Has anyone done this, and have some idea how to set things up this way?
It's Nagios 1.3, if that makes any difference.
-- 
 --------------------------------------------------------------------------
|  Stephen Gran                  | "The medium is the message." --         |
|  steve@lobefin.net             | Marshall McLuhan                        |
|  http://www.lobefin.net/~steve |                                         |
 --------------------------------------------------------------------------

Attachment: pgpUzihfX9Fx5.pgp
Description: PGP signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug