bergman on 25 Mar 2014 12:50:11 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] check for a file being transmitted via ftp

In the message dated: Tue, 25 Mar 2014 14:05:12 -0400,
The pithy ruminations from "Eric at" on 
<[PLUG] check for a file being transmitted via ftp> were:
=> I expect a file to be routinely transmitted to me via ftp.  A bash script,
=> invoked as a cron job, will process and then archive that ftp file.
=> All of that is easy/routine.
=> My question is: how do I know that the file is complete?  I don't want to
=> start processing the file without being sure that the ftp process is done.
=> I thought about checking the size of the file in bytes and then comparing that
=> to it's size 1 minute earlier... if it's not growing then it's probably done.
=> But, that's just *probably* as ftp could be stalled temporarily or worse.
=> I have also considered about using inotify cron (incron) but I expect that it
=> would be triggered at the file's creation - not when it's time for me to

Did you check the man page for incron? It can be triggered on the event:

	IN_CLOSE_WRITE      File opened for writing was closed [0]

Inotify (and incrond) can be used to monitor a directory, triggering when a
[new] file in the directory has been closed after writing, so you don't even
need to know the name of the file in advance. This eliminates the overhead of
repeatedly polling a directory to see if a new file has been uploaded, then
polling to see if the upload seems to have finished.

However, using incrond is still subject to the same weakness as the 'lsof'
(check if there's an open filehandle) problem:

	Some FTP clients support an "append" syntax, and many clients will
	'resume' an interrupted transmission, so a closed filehandle doesn't
	necessarily mean that the entire file has been uploaded.

Regardless of the method for detecting that the upload finished,
you'll need some way to verify that the file you received is complete and
uncorrupted. I suggest that you specify that you will only accept compressed
files -- zip compression is multiplatform, commonly used, and provides an easy
way to verify that the file as received is consistent.

So, I'd probably do something like:

	use incrond to trigger a script when a file has been uploaded (when
	the filehandle opened for writing has been closed)

	check for other invocations of incrond calling the same script with
	the same target file, and exit if another instance already exists[1]

	use 'unzip' to verify the file, not uncompressing it in place

	if the verification fails
		record the file size
		start a counter
		sleep for a short period (30 seconds?)
		while ( counter < limit )
			if the file size has changed
				if the filehandle is open
					exit [2]
					call the trigger script again with the same
					target file [3]
				increment the counter

		done [4]
		move the file out of the ftp upload directory
		do your processing

[0] I don't know enough about the internals of FTP to be sure that it keeps a
    file handle open during an entire session, and the behavior might be
    FTP-server specific.

[1] after the first FTP session closed, another FTP session may be appending 
    to the file, when the N+1 session closes, this will trigger incrond again

[2] if the file handle is open, this means that an append in process...exit
    from the monitoring script, trusting that incrond will trigger again when
    this filehandle is closed

[3] it's possible that the append finished during the 30second nap, so call
    the trigger script again to check

[4] if you've waited long enough, and nothing has appended to the corrupt zip
    file, clean up, log the error, exit


=> process it.
=> Thanks
=> Eric
=> -- 
=> #  Eric Lucas
=> #
=> #                "Oh, I have slipped the surly bond of earth
=> #                 And danced the skies on laughter-silvered wings...
=> #                                        -- John Gillespie Magee Jr
=> ___________________________________________________________________________
=> Philadelphia Linux Users Group         --
=> Announcements -
=> General Discussion  --

Mark Bergman    Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand
'94 Yamaha GTS1000A^2

I want a newsgroup with a infinite S/N ratio! Now taking CFV on:
15+ So Far--Want to join? Check out: 
Philadelphia Linux Users Group         --
Announcements -
General Discussion  --