Stephen Gran on 25 Mar 2005 17:46:12 -0000 |
Hello all, Are there any Nagios users here? I am having some conceptual problems getting it set up the way I would like. Basically, we are monitoring about 50 machines, each with several services, spread across several netblocks. It runs great, and does the right thing most of the time. However, we still have a few quirks we want to work out. If a network goes down, we want to supress alerts for every host and service on that netblock. We have managed to do this successfully using the 'parents' option, so that Nagios sees the path to a host as Nagios -> remote gateway -> remote machine a -> remote service a \> remote machine b -> remote service b This seems to be correct, and at least works as expected. In this setup, nagios doesn't check service a if machine a is unreachable, and doesn't check machine a and b or service a and b if the gateway is down. However, the other day we had a problem where bind blew up on the Nagios server, making all machines unresolvable, and hence unreachable. We got a million alerts (once bind was restarted and email started working again, of course :) So, now I came up with the idea of using a 'canary in the coal mine' kind of test before checking other hosts. The nagios server is a multi-homed host, and several of the IP's it serves have hostnames not in /etc/hosts. These are perfect candidates to test DNS failure - they will never be down if the nagios machine is up, but resolving the hostname requires bind to be running. So, we picked one, and now we are trying to figure out how to make Nagios abort other checks if this one fails. We tried this with the 'dependant_host' option (IIRC), and then stopped bind, but all services and hosts quickly went into alert status, rather than just the one. So that first guess was wrong, clearly :) Has anyone done this, and have some idea how to set things up this way? It's Nagios 1.3, if that makes any difference. -- -------------------------------------------------------------------------- | Stephen Gran | "The medium is the message." -- | | steve@lobefin.net | Marshall McLuhan | | http://www.lobefin.net/~steve | | -------------------------------------------------------------------------- Attachment:
pgpUzihfX9Fx5.pgp ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|