Amir Tahvildaran on 3 May 2012 14:53:51 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] interesting apache issue & problem solving case study


This one had me stumped for an hour (and I still don't have all of it solved).  I think it's good to reflect on your problem solving approach so you tweak it for the next problem.  I invite you to comment and reflect.  What would you have done differently?

So here is the problem & evidence.  The solution, reflection and remaining mystery are below.  

Problem:
Loading http://site/dirZ/FileA.html resulted in an abruptly closed connection.  

Evidence (pretty much in the order I tried things IIRC):
The website was up and running otherwise:
http://site/ works
http://site/dirA/ works

http://site/dirZ/FileA.html showed an error in chrome.  chrome inspector's network tab didn't show the request or response.
wget confirmed - there simple wasn't a response from the server.

http://site/dirZ/index.html loaded normally
http://site/dirZ/FileB.inf loaded normally
http://site/dirZ/FileC.xml broke same as FileA

I know dirZ is mounted (samba in this case) from another server and it had been working (and from above some files still work).

cp FileA.html Test1.html
vi Test1.html - hey - there is a junk first character!
=========
<U+FEFF><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
=========
<<feff>> 65279, Hex feff, Octal 177377
=========

delete that character

$ file FileA.html 
FileA.html: UTF-8 Unicode C++ program text, with CRLF line terminators
$ file Test1.html
Test1.html: ASCII text

check http://site/dirZ/Test1.html in chrome, still broken

clear the cache, still broken
check in wget, still broken

check apache error logs - no mention

generate error in apache - error show up in apache error logs

vi Test1.html #delete bottom half of the file (assuming there are junk characters somewhere else in the file)
still broken
vi Test1.html #revert, delete the top half of the file
still broken
vi Test1.html #delete all lines from the file
it works
vi Test1.html #delete as much of the file as I can while still keeping the error, I wind up with this:

==============
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>MyProj</title>

<script language="_javascript_">
        useGoogleTracker = "0";
==============

If I delete the last line it loads.  If I keep the last line and delete any other lines it loads.  CRAZY!!!

Ok, bring in a second pair of eyes.  I demo the problem to him.

We check the encoding for each of the characters on the last time (still thinking its an encoding problem).  I even retype the last line, still broken.

I show him the error log is empty.

My coworker suggests to check the access_log, it reports it as:
"GET /dirZ/Test1.html HTTP/1.1" 200 264 "-" "browser string"

What?! 200?

Ok, break out tcpdump: 
tcpdump -s0 -w /tmp/test1-4.tcpd -i any port 80
#make two requests, one good one, one bad one
#if anyone can tell me how to capture for (src="" OR dst=MYIP), I'd appreciate that.

first request for http://site/dirA/ looks normal
HTTP request in
HTTP request out
second request for http://site/dirZ/Test1.html is missing the HTTP response, I just get a "TCP segment of a reassembled PDU"

I ignore this even though it had a clue in it (pasted way down below)

I check the apache process IDs - no processes are dying between error requests.

I copy the file to another directory, it works there.



Solution (spoiler?):
I start to plan on rsyncing the files periodically since I know I can get it to working out of a normal directory.

Finally, I google "apache samba share": http://www.letushostu.com/tips/apache-and-documentroot-on-a-cifssamba-share/

That directory needs to have "EnableSendfile Off", which is a way apache hand the kernel work - this doesn't work for mounted directories.


Remaining mystery:
Why do certain files work and others not?


Reflection:

I basically convinced myself early on that this was a "bad character" problem/apache bug.

I tried a lot of my standard problem solving approaches: 
* reduce/solve a simpler problem
* second pair of eyes
* clear the cache :-)
* view the bits

But I didn't try my #1 problem solving technique early enough: google.  I think this is because:
* I didn't have an error message to search for
* I trusted myself to know how apache works (and I didn't in this case).  I'm sure some of you figured this out as soon as I mentioned samba.

Had I not googled it, I think this is what I would have done:
* reproduce on a test machine
* bring apache down to 1 child process
* strace a good request
* strace a bad request
* compare
* ....


Reflection on reflecting:

And, there is value in reflecting - I think I now have a decent theory on the "remaining mystery".

When I was searching for the access log to paste in, I remembered it was 264 bytes and that's what I searched for, only when I stopped chasing the bug did I think about the significance of 264.  It's only ~one-line~ over 255, a possible cutoff for apache to not use the Sendfile optimization.



-Amir




Here was the TCP packet I had ignored above (sorry, I don't know a better way to extract the text form wireshark)

0000  00 04 00 01 00 06 00 22  19 54 4d b9 00 00 08 00   ......." .TM.....
0010  45 00 01 31 6e 90 40 00  40 06 07 3f 90 76 44 63   E..1n.@. @..?.vDc
0020  90 76 5e a8 00 50 84 cd  d6 0d b2 3c b8 67 2a 45   .v^..P.. ...<.g*E
0030  50 18 1c 68 c5 1b 00 00  48 54 54 50 2f 31 2e 31   P..h.... HTTP/1.1
0040  20 32 30 30 20 4f 4b 0d  0a 44 61 74 65 3a 20 54    200 OK. .Date: T
0050  68 75 2c 20 30 33 20 4d  61 79 20 32 30 31 32 20   hu, 03 M ay 2012 
0060  31 39 3a 35 36 3a 32 31  20 47 4d 54 0d 0a 53 65   19:56:21  GMT..Se
0070  72 76 65 72 3a 20 41 70  61 63 68 65 2f 32 2e 32   rver: Ap ache/2.2
0080  2e 33 20 28 43 65 6e 74  4f 53 29 0d 0a 4c 61 73   .3 (Cent OS)..Las
0090  74 2d 4d 6f 64 69 66 69  65 64 3a 20 54 68 75 2c   t-Modifi ed: Thu,
00a0  20 30 33 20 4d 61 79 20  32 30 31 32 20 31 39 3a    03 May  2012 19:
00b0  33 39 3a 30 36 20 47 4d  54 0d 0a 45 54 61 67 3a   39:06 GM T..ETag:
00c0  20 22 32 65 34 65 66 35  61 2d 31 30 38 2d 66 34    "2e4ef5 a-108-f4
00d0  66 31 65 61 38 30 22 0d  0a 41 63 63 65 70 74 2d   f1ea80". .Accept-
00e0  52 61 6e 67 65 73 3a 20  62 79 74 65 73 0d 0a 43   Ranges:  bytes..C
00f0  6f 6e 74 65 6e 74 2d 4c  65 6e 67 74 68 3a 20 32   ontent-L ength: 2
0100  36 34 0d 0a 43 6f 6e 6e  65 63 74 69 6f 6e 3a 20   64..Conn ection: 
0110  63 6c 6f 73 65 0d 0a 43  6f 6e 74 65 6e 74 2d 54   close..C ontent-T
0120  79 70 65 3a 20 74 65 78  74 2f 68 74 6d 6c 3b 20   ype: tex t/html; 
0130  63 68 61 72 73 65 74 3d  55 54 46 2d 38 0d 0a 0d   charset= UTF-8...
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug