Kyle R . Burton on Wed, 19 Jun 2002 10:53:49 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] capture browser request


> I'm not a web programmer - I have just enough knowledge to be dangerous.
> Hopefully, somebody here can provide me with the clues to answer to this 
> question!
> 
> I've got a web service that I pay for.  They require me to log in on one web
> page and then click a button on the next page to download a file.  Since I'd
> like to put this download in a script, I want to use curl or some similar
> program.  Unfortunately, I cannot see the request that the browser sends to the
> site to initiate the download.  I've examined the source for the web page and 
> I see following code in the source:
> 
> -----code excerpt------
> 
> <form action=http://www.site.com/cgi-bin/download.cgi method=post>
> <input type=hidden name="action" value="download">
> <input type=hidden name="file" value="20020610">
> <input type=hidden name="area" value="1">
> <input type=hidden name="login" value="userlogin">
> <input type=hidden name="PIN" value="password">
> <tr align=center><td><input type=submit value="20020610.zip"></td>
> </form>
> 
> -----end code excerpt------
> 
> I'm not exactly sure what I'm looking at but I believe that the input(s)
> that are hidden are transmitted to the site along with the request.  The
> input with type=submit actually initiates the request.
> 
> Is there a way for me to capture the request that the browser sends?  
> I run SuSE 7.3 - KDE 2.2.1.  The browser is Knoqueror.  
> 
> I believe that, eventually, I'll have to write a perl program to simulate
> the browsers actions (if it cannot be done simply using curl).
> 
> Any help greatly appreciated.

When doing my won web development, I've found netcat to be indespensible:

  http://www.bgw.org/tutorials/utilities/nc.php

The basic sequence of events I typicly use is to save the form to disk,
add a <base href="http://that.site/path/to/page/";>, change the form action
to point to localhost:8888, run "netcat -l -p 8888 < /dev/null", load
the html page, from disk, into the browser and submit it.  Netcat should
then capture the data that the browser would have sent to the remote system.

If the hidden variables change from request to request, you'll probably
end up having to dynamicly fetch the page and then parse out the form 
elements.  

If you're comfortable with Perl, have a look at LWP::UserAgent.  It's basicly
a web browser that you can control from your Perl code.  There are other
HTTP libraries for other langugaes as well (for Java, try HttpClient from
the Apache Jakarta Commons project; for C/C++, try libghttp [comes with many
linux distributions], httplib.py for Python [http://www.lyra.org/greg/python/httplib.py]).


HTH

Kyle



-- 

------------------------------------------------------------------------------
Wisdom and Compassion are inseparable.
        -- Christmas Humphreys
mortis@voicenet.com                            http://www.voicenet.com/~mortis
------------------------------------------------------------------------------

______________________________________________________________________
Philadelphia Linux Users Group       -      http://www.phillylinux.org
Announcements-http://lists.phillylinux.org/mail/listinfo/plug-announce
General Discussion  -  http://lists.phillylinux.org/mail/listinfo/plug