Tim Allen on 11 Jul 2015 10:55:14 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Basic network monitoring and link quality software


Hey Eric, long time no see.

I don't know if this would help with monitoring, but when I was going through it with Comcast, I wrote a super simple Python script to log ping times to Google. I was having intermittent sluggishness (ping times of always over 5000ms for extended periods), and logging these helped me make my case. This was in response to Comcast's solution of sending a technician over for anything, and of course, it working fine when the tech was here. Running it for a week was enlightening on just how often they weren't living up to their SLA.

If you're interested, it is really simple: https://github.com/FlipperPA/latency-tester

You can set the frequency (defaults to 5 seconds), threshold to log (defaults to 100ms) and the destination to ping (default is Google). Give it a try overnight some time and you might be surprised. While this only tests end-to-end, and not intermittently like traceroute, it still provides useful data.

I wrote it very early in my Python days so it could definitely use improvements.

Regards,

-Tim

On Sat, Jul 11, 2015 at 1:15 PM, Eric Lucas <eric@lucii.org> wrote:
Excellent Kevin!  Thank you.

I have VOIP with voip.ms and while talking to a friend last night he said that there were infrequent, brief break-ups in my line.  I heard nothing abnormal - he sounded 100% all the time.  Of course, he's on a cell phone so my immediate question was: who's fault is it?   Based on your email I may use the voip server as an endpoint and run one of those 60-second tests to gather some more relevant data.

Thanks again,
Eric

On Sat, Jul 11, 2015 at 12:51 PM, Kevin McAllister <kevin@mcallister.ws> wrote:
On Jul 11, 2015, at 12:02 PM, Eric Lucas <eric@lucii.org> wrote:

Hey, this is very interesting!

Here's the output for the first run:

mtr --report -c3 www.google.com                                            
Start: Sat Jul 11 11:31:25 2015
HOST: saturn                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.10.10.1                 0.0%     3    1.2   2.0   1.2   3.8   1.4
  2.|-- L100.PHLAPA-VFTTP-71.veri  0.0%     3    6.3  33.9   6.3  87.2  46.2
  3.|-- G102-0-0-16.PHLAPA-LCR-21  0.0%     3    9.5  28.5   9.5  55.7  24.2
  4.|-- ae5-0.PHIL-BB-RTR1.verizo  0.0%     3   57.6  28.0   9.2  57.6  25.9
  5.|-- 0.xe-7-0-2.XL1.IAD8.ALTER  0.0%     3   47.9  25.9  12.5  47.9  19.2
  6.|-- 0.xe-8-2-0.GW9.IAD8.ALTER  0.0%     3   10.7  55.5  10.7 143.9  76.6
  7.|-- google-gw.customer.alter.  0.0%     3   51.3  81.0  47.0 144.7  55.2
  8.|-- 209.85.252.46              0.0%     3   69.1 110.3  69.1 182.8  63.0
  9.|-- 209.85.143.112             0.0%     3   17.2  45.1  16.5 101.5  48.9
 10.|-- 216.239.40.209             0.0%     3   23.0  23.9  16.6  32.1   7.7
 11.|-- 72.14.236.227              0.0%     3   98.2  89.1  25.7 143.5  59.4
 12.|-- 209.85.250.7               0.0%     3   26.3  33.1  26.3  45.6  10.8
 13.|-- yyz08s14-in-f19.1e100.net 33.3%     3  115.2  71.7  28.2 115.2  61.5

about 3 minutes later...

Start: Sat Jul 11 11:34:30 2015
HOST: saturn                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.10.10.1                 0.0%     3    1.4   1.4   1.2   1.6   0.0
  2.|-- L100.PHLAPA-VFTTP-71.veri  0.0%     3    4.8   5.8   4.8   6.4   0.7
  3.|-- G0-9-3-2.PHLAPA-LCR-22.ve  0.0%     3   10.3  11.1  10.2  12.9   1.4
  4.|-- ae6-0.PHIL-BB-RTR2.verizo  0.0%     3    5.9  33.5   5.9  85.5  45.1
  5.|-- 0.xe-11-1-1.XL2.IAD8.ALTE  0.0%     3   13.4  12.7  11.4  13.4   0.7
  6.|-- 0.xe-9-1-0.GW9.IAD8.ALTER  0.0%     3   13.8  11.5  10.0  13.8   1.9
  7.|-- google-gw.customer.alter.  0.0%     3   19.0  19.4  15.3  23.8   4.2
  8.|-- 209.85.252.80              0.0%     3   14.6  15.5  12.5  19.4   3.5
  9.|-- 72.14.236.152              0.0%     3   14.1  13.3  12.8  14.1   0.7
 10.|-- 216.239.40.159             0.0%     3   16.2  16.0  15.3  16.4   0.0
 11.|-- 72.14.236.225              0.0%     3   28.1  30.0  27.2  34.9   4.1
 12.|-- 72.14.239.19               0.0%     3   28.4  27.9  27.2  28.4   0.0
 13.|-- yyz08s09-in-f18.1e100.net  0.0%     3   31.5  29.7  27.0  31.5   2.2

So, in analyzing my Verizon FiOS, should I only concerned about rows #2 #3, and #4?
Row 4 in the second run seems suspicious (85.5 for worst?)


A couple things to be cautious with when looking at mtr, ping traceroute numbers.

First.  The mtr command you’re running is only doing 3 packets.  It may not give you a great idea of what’s going on with a sustained flow of data.  At work we like to beat the crap out of things using mtr by simulating something “like” a voice RTP flow.  e.g.

mtr -s 200 -i 0.020 -c 3000 --report $destination

That will send a ton of test packets 200 bytes at 20ms intervals.  (That’s close to what RTP traffic looks like in a voice application).  And 3000 test packets will take 60s to run (still a short sampling period)

The second thing to be cautious of is the average latency from a single router on the way is going to be hard to reason about any problems.  Many router vendors de-prioritize the responding to ICMP packets.  I’ve had a set of brand new very expensive Junipers in a lab with no traffic running across them except for a small amount of mtr/ping traffic and I was showing packet loss and bad latency on the Juniper devices but my end to end numbers to the other test host on the same lab network were excellent.

Third, the main thing you can learn from mtr is if a problem is being introduced where it’s starting.  And you’ll see that like follows.  If your average latency on hop 4 jumped up to 150ms, but you saw a similar increase in every hop between you and the other end, including the device you are testing to.  Then you can reason that there is an increase in latency which starts between hops 3 and 4. 

This same reasoning can be used for packet loss.  If you see 50% packet loss on a hop in the middle of your mtr trace but no significant loss further along the chain then there is no reason to worry about it.

Fourth, keep in mind you are measuring round-trip time and packet loss on round trips.  It is not uncommon for routes on the Internet to be asymmetrical.  That means your packet loss and latency numbers are not only measuring the path forward you can see from the mtr, but the return path that you can’t see.  The only way to get an idea of the return path is to get a trace from the other side back toward you.

Fifth, flow based load balancing across redundant can screw up your return path trace to show a different path back to you than your forward trace is seeing.

With all those caveats, I think mtr is an excellent tool and use it daily, or at least daily when pulled into any sort of question of packet-loss, network performance.  I especially like running it in non-report mode (interactive mode).  It’s curses based so hit question mark and see what other modes you can use to look at live updating results while a test is being performed.  I like hitting ‘j’ to see their jitter numbers and a drop counter.

So my assessment of the 85ms bad ping response from hop 4.  I’d say no problem at all.  You’ll see hops 3 and 4 are different in your second trace.  Also so is your destination.  Remember www.google.com isn’t a host but a DNS lookup that will result in many possible hosts.  

And even if you trace to the same IP address to you could end up in radically different paths and even data-centers/hosts especially when dealing with a company like google who is excellent at service availability and probably algorithmically re-routing stuff.

Hope all this is helpful.

- Kevin


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug



___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug