Paul . L . Snyder on Mon, 17 Jun 2002 13:40:59 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] capture browser request


On 17 June, 2002, "Kyle Burton" <mortis@voicenet.com wrote:

>> I've got a web service that I pay for.  They require me to log in on one
web
>> page and then click a button on the next page to download a file.  Since
I'd
>> like to put this download in a script, I want to use curl or some
similar
>> program.  Unfortunately, I cannot see the request that the browser sends
to the
>> site to initiate the download.  I've examined the source for the web
page and
>> I see following code in the source:
>>
>> -----code excerpt------
>>
>> <form action=http://www.site.com/cgi-bin/download.cgi method=post>
>> <input type=hidden name="action" value="download">
>> <input type=hidden name="file" value="20020610">
>> <input type=hidden name="area" value="1">
>> <input type=hidden name="login" value="userlogin">
>> <input type=hidden name="PIN" value="password">
>> <tr align=center><td><input type=submit value="20020610.zip"></td>
>> </form>
>>
>> -----end code excerpt------

Sometimes, CGI scripts will accept data from either a GET or a POST method.
You
might try curling the URL


http://www.site.com/cgi-bin/download.cgi?action=download&file=20020610&area=1&login=userlogin&PIN=password

This will send a request to the server using the GET method.

Since the submit button (the <input type=submit...>) has a 'value'
attribute but
no 'name' attribute, it won't send a name/value pair - the value is simply
used
as the text for the button when the HTML is rendered.  (IIRC.)

If you refer to the 'Using cURL to automate HTTP jobs tutorial' on the curl
web
site [ http://curl.haxx.se/docs/httpscripting.html ], section 4.2 describes
the
process of send a POST request.  For your case, you'd do something like:

  curl -d
"action=download&file=20020610&area=1&login=userlogin&PIN=password"
http://www.site.com/cgi-bin/download.cgi

The above is a long line, so mind the wrap.

>> I'm not exactly sure what I'm looking at but I believe that the input(s)
>> that are hidden are transmitted to the site along with the request.  The
>> input with type=submit actually initiates the request.

Exactly right.  Hidden INPUTs simply do not show up when the page is
rendered,
but are passed along when the FORM is submitted.

>> Is there a way for me to capture the request that the browser sends?
>> I run SuSE 7.3 - KDE 2.2.1.  The browser is Knoqueror.
>>
>> I believe that, eventually, I'll have to write a perl program to
simulate
>> the browsers actions (if it cannot be done simply using curl).
>>
>> Any help greatly appreciated.
>
> When doing my won web development, I've found netcat to be indespensible:
>
>   http://www.bgw.org/tutorials/utilities/nc.php
>
> The basic sequence of events I typicly use is to save the form to disk,
> add a <base href="http://that.site/path/to/page/";>, change the form
action
> to point to localhost:8888, run "netcat -l -p 8888 < /dev/null", load
> the html page, from disk, into the browser and submit it.  Netcat should
> then capture the data that the browser would have sent to the remote
system.

I've used programs like tcpdump and ngrep to monitor both ends of the
conversation - the browser request and the server's response.  The output
can be ugly, but you get the information that you need.

> If the hidden variables change from request to request, you'll probably
> end up having to dynamicly fetch the page and then parse out the form
> elements.
>
> If you're comfortable with Perl, have a look at LWP::UserAgent.  It's
basicly
> a web browser that you can control from your Perl code.

LWP works well enough, but it tends to require pretty verbose programs to
control it, IME.  If I'm just trying to hack together something quick and
dirty, I'll often end up using perl to parse the dynamic bits out of a
web page, and then use curl to execute the actual request.  For more
involved clients, LWP is a good solution.

> There are other
> HTTP libraries for other languages as well (for Java, try HttpClient from
> the Apache Jakarta Commons project; for C/C++, try libghttp [comes with
many
> linux distributions], httplib.py for Python
[http://www.lyra.org/greg/python/httplib.py]).

Sometimes, CGI scripts will include various protections to prevent them
from being called directly - most frequently, this is a check for a
'Referer:' header (use the '--referer' switch in curl).  If it doesn't seem
to be working, keep tweaking your curl command until its request looks
exactly like the request sent by the browser.  Watching sessions using
tcpdump

If you don't want your request to show up in the service's logs as coming
from curl, you can specify a 'User-Agent:' header using curl's '
--user-agent'
switch.

HTH,
pls


______________________________________________________________________
Philadelphia Linux Users Group       -      http://www.phillylinux.org
Announcements-http://lists.phillylinux.org/mail/listinfo/plug-announce
General Discussion  -  http://lists.phillylinux.org/mail/listinfo/plug