James Barrett on 14 Oct 2007 20:26:30 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] network/server troubleshoot


What is the model of your NIC?  Blacklist the tulip kernel module and use the 
dmfe module.  Some google results ('tulip_stop_rxtx eth') seem to point to 
that being a problem, particularly these two results:

http://www.suseforums.net/index.php?showtopic=18730
http://www.linuxquestions.org/questions/slackware-14/can-ping-router-but-cant-connect-to-it-nor-internet-451918/

HTH

On Sunday 14 October 2007 15:25, Eric wrote:
> I've been having an intermittent problem with my firewall server and/or
> Internet connection.  Unfortunately, I don't have the time to spare to
> "tinker" with it and I'm not a network expert either.  I'm hoping someone
> here has some insight because my current favorite solution involves
> blasting caps and some mixtures better left unmentioned :-) [that's a joke
> to express my frustration BTW]
>
> Background:  Firewall is a SME server/CentOS based system with 2 nics. 
> eth0 is the Internet and eth1 is the LAN.
>
> The system is running djbdns tools (dnscache and tinydns) but they appear
> blameless AFAIK.  I did set it up to use opendns.com rather than my ISP
> (Cavalier DSL) but this changed nothing - the problem persisted.
>
> Frequently the Internet connection just ceases to work properly.  It may
> fix itself after some indeterminate time.  Here is what I observe:
>
> ( for all of the following I am logged in as root on the firewall )
>
> 1.  When it does not work (no traffic appears to go in or out) and I type
>     ping www.google.com I get the message: ping: unknown host
> www.google.com
>
> 2.  Fetchmail complains like this:
>
>       fetchmail: awakened at Sun Oct 14 09:32:39 2007
>       fetchmail: Query status=2 (SOCKET)
>       fetchmail: timeout after 300 seconds waiting to connect
>                     to server pop.gmail.com.
>       fetchmail: socket error while fetching from pop.gmail.com
>
> 3.  I can "fix" this situation by entering the following commands (which I
> have combined into a script called "toggle":
>
>      #!/bin/bash
>      /sbin/ifdown eth0
>      sleep 3
>      /sbin/ifup eth0
>
> 4.  To log the problem and temporarily "deal" with it I created a script
>     called doody and put it in the root cron to run every minute.
>     (You can guess the reason for the name)
>
>      #!/bin/bash
>      /bin/ping -W 10 -c 1 www.google.com  >/dev/null
>      if [ "$?" == "0" ]
>      then
>          echo -n '.'
>      else
>          echo ''
>          echo -n 'trouble: '
>          date
>          /root/bin/toggle
>      fi
>
>      Okay, it's stupid but it works temporarily and the outages don't last
>      more than a minute this way :-P
>
>      DESPERATION, not necessity, is the mother of invention.
>
> 5.   There are no relevant messages in /var/log/messages when it fails.
>
> 6.   When I "toggle" the eth0 interface I sometimes see this in
>      /var/log/messages:
>
>         Oct 14 13:41:18 polaris kernel: eth0: Setting full-duplex
>              based on MII#1 link partner capability of 01e1.
>
>      less frequently the above link is preceded by:
>
>         Oct 14 15:03:12 polaris kernel:
>               0000:01:01.0: tulip_stop_rxtx() failed
>
>      Google search on "tulip_stop_rxtx" and failed yields a bunch of
>      useless comments from the kernel list.  Bad news IMHO but I don't
>      know what to do about it other than swap out the tulip-based nics.
>
>      Here, for example, is the output of a few hours of doody.log - the
>      output from the doody naturally (every period represents a minute
>      without a problem.)  You can see the frequency of the interruptions:
>
>         trouble: Sun Oct 14 09:59:11 EDT 2007
>         .......................................................
>         trouble: Sun Oct 14 10:55:11 EDT 2007
>         ...................
>         trouble: Sun Oct 14 11:15:11 EDT 2007
>         ..............
>         trouble: Sun Oct 14 11:30:11 EDT 2007
>         .........
>         trouble: Sun Oct 14 11:40:11 EDT 2007
>         ............
>         trouble: Sun Oct 14 11:53:11 EDT 2007
>         .........................
>         trouble: Sun Oct 14 12:19:11 EDT 2007
>         .......................................................
>         trouble: Sun Oct 14 13:15:11 EDT 2007
>         .........................
>         trouble: Sun Oct 14 13:41:11 EDT 2007
>         ...
>         trouble: Sun Oct 14 13:45:11 EDT 2007
>         ....................
>         trouble: Sun Oct 14 14:06:11 EDT 2007
>
>
> My biggest problem is that I don't know how or where to get more
> information for troubleshooting this.  It's almost worth the trouble to
> just replace all the nics and reconfigure the system.  If I knew that would
> fix it I would do that ASAP.
>
> Advice appreciated!
>
> Eric
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug