Re: [PLUG] Advice needed on collecting files from FTP site

JP Vossen on 20 Apr 2010 21:58:43 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Advice needed on collecting files from FTP site

From: JP Vossen <jp@jpsdomain.org>

To: plug@lists.phillylinux.org

Subject: Re: [PLUG] Advice needed on collecting files from FTP site

Date: Wed, 21 Apr 2010 00:58:33 -0400

Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>

Sender: plug-bounces@lists.phillylinux.org

User-agent: Thunderbird 2.0.0.24 (X11/20100317)

> Date: Tue, 20 Apr 2010 15:38:00 -0400 > From: Mike Leone <turgon@mike-leone.com> > > So here's my situation - I have a Linux server set up in a DMZ, running > VSFTP. Each FTP account is chrooted. We will be using this for vendors > to send us invoices, etc. FTP, yuck... (Had to be said.) > The FTPing part is working fine. The chrooting is working fine. What I > need to do now, is to have a method of sweeping through all these home > folders; collect any new files; zip them all together; and FTP them > inbound to the trusted part of my LAN. And then delete the file, once > it's been FTPed in. > > And there I am stuck. :-) I'm sure that's something simple to set up in > a script, but I'm not a scripting guy. Not on Linux, and only very > little in Windows (although I can figure out how to do this as a windows > CMD file). I'd encourage you to pick up some shell scripting, it will vastly increase what you can do as a sysadmin. The basics are nothing more than a DOS batch file, except less buggy and arguably more quirky, though there are some DOS batch constructs that boggle the mind. For learning the bash shell, I recommend _Learning the bash Shell_ currently in 3rd edition, though 4th is being worked on. And if you like cookbooks, I may modestly suggest the _bash Cookbook_. :-) If you want free/on-line stuff, go to http://www.bashcookbook.com/bashinfo/#RepoRefs and look for "Bash Guide for Beginners", "Advanced Bash-Scripting Guide" and similar. > So: if anybody knows of a program that already does that sort of pruning > and collecting of files, that would be a start. Or a sample script that > does something similar, I could maybe fumble my way through. > > This is running on Ubuntu at the moment; eventually it will go on a > server running Red Hat Enterprise. That's an interesting one, and there are a few points I haven't seen addressed yet. 1) You need to consider what happens if you do your collect/zip/move/delete part while someone is uploading a file. Sure you can run your part in the middle of the night, and that'll work fine until someone works late, in a different time zone, or automates their part too. 2) A DMZ machine should have very strictly limited ability to connect *in* to the LAN, else what's the point. So having that machine initiate the connection into the LAN is sub-optimal. 3) If you run from cron on the DMZ machine, you really need to allow email from that machine for cases where the job messes up. But per #2, you don't want it going in to the LAN. So you can send plain-text email out to a mail relay, and then come back in, which possibly leaks info, or you can do something else. So, I'd do something like this. First, I'd write a script that looks for files in the right place, then for each one it finds waits a few secs to see if the file gets bigger. If it does, it's still being written, so skip it. That won't account for network delays longer than our wait time, but you've gotta draw the line somewhere. I'd zip -9m the files, which will delete them *if* the zip works (and the files are writable). And I'd keep the last few ZIP files, just in case. Call this snag_files.sh (below). I'd create a NON-ROOT user on the DMZ machine, make sure it had read-write perms where it needed them, and give it snag_files.sh and an SSH key. Then, I'd put another trivial script on some trusted machine on the LAN, that has working cron and email, and I'd set up a cron job for that script. Call this one download_files.sh. Or, just do it all in-line in cron, which'll work for a while, until you start adding more features. Part 1 of download_files.sh is to SSH into the DMZ as the right user and actually run snag_files.sh (using password-less keys or better yet 'keychain' & SSH agent). That avoids allowing the DMZ machine in, since you are already going out. And it avoids having to deal with cron and email on the DMZ server (though you probably really do want email to work for log monitoring and other cron jobs). Part 2 of download_files.sh is to actually download the file. But how do you know what file to download, if we're naming them with CCYYMMDD_HHMMSS and keeping archives? There are a few ways to deal with that, but maybe the simplest and most brute force solution is just to rsync all of them, which also gives you a bit of a backup. Using the 'rsync --delete' flag will keep the local side cleaned up too. So, to pull it all together: *** ALL OF THIS IS UNTESTED *** LAN side cron job (all on one line): # Need "passwordless" SSH working first! ... ssh -i /path/to/key/file user@dmz.example.com -c '/remote/path/to/snag_files.sh' && rsync --delete -e ssh user@dmz.example.com:snagged*.zip /home/user/snagged/ DMZ side script (will probably get mangled by the MTAs and MUAs): #!/bin/bash - # snag_files.sh--Snag some files and package up in ZIP file TREE='/home/ftp/' # Must be read-write by user, so ZIP can read and delete LAST_RUN="$HOME/snag_files.last_run" # Must be writable by user SLEEP_SECS='5' # Wait between file checks. # If you have a lot of files to process, this will add up fast... ZIP_FILE="$HOME/snagged_$(date '+%Y-%m-%d_%H:%M:%S').zip" MAX_ZIPS_TO_KEEP='5' # Keep this many previous ZIP files, just in case #---------------------------------------------------------- # Define functions #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ function _file_size { # "Utility" function to return file size # Called like: now=$(_file_size "$file") # Returns: file size # We've already made sure the file exists and is readable, so... local file="$1" \ls -s "$file" | cut -d' ' -f1 } # end of function _file_size #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ function _shift_by { # Shift or remove a given number of items from the top or front of a list, # such that you can then perform an action on whatever is left. # # For example, list some files or directories, then keep only the top 10. # # It is CRITICAL that you pass the items in order, since all this function # does is remove the number of entries you specify from the front or top # of the list. # # You should experiment with echo or mv before using rm! # Called like: _shift_by <# to keep> <ls command, or whatever> # For example: # rm -rf $(_shift_by $MAX_BUILD_DIRS_TO_KEEP $(ls -rd backup.20*)) # Returns: shifted list # If $1 is zero or greater than $#, the positional parameters are # not changed. In this case that is a BAD THING! if (( $1 == 0 || $1 > ($# - 1) )); then echo '' else # Remove the number of dirs to keep from the list, plus 1 for the # 'number of dirs to keep' argument itself. shift $(( $1 + 1 )) # Return whatever is left echo "$*" fi } # end of function _shift_by ########################################################### # Main() # Find the files, and make sure they aren't still being written for file in $(find $TREE -newer $LAST_RUN -type f); do # Make sure the file exists and is readable. Since we just found it, it # *should* be, but check anyway... [ -r "$file" ] && { now=$(_file_size "$file") # File size? sleep $SLEEP_SECS # Wait a bit later=$(_file_size "$file") # File size again? # If the file isn't any bigger, I guess it isn't still being written... [ "$now" = "$later" ] && files_to_zip="$files_to_zip $file" } done # IF we have any files to process: [ "$files_to_zip" ] && { # Zip them up (this will barf on files with spaces) # -9 = max compression, -m = move them into ZIP (i.e., delete original) # Note this KEEPs paths. -j to junk 'em, but that risks file collisions echo zip -9m $ZIP_FILE $files_to_zip && { # IF the zip worked, remove old ZIP files zip_files_to_nuke=$( \ _shift_by $MAX_ZIPS_TO_KEEP $(ls -1r ${ZIP_FILE//_*./*.}) ) [ "$zip_files_to_nuke" ] && echo rm -rf $zip_files_to_nuke } } *** UNTESTED *** As noted all of that code is untested. Also, the script has two 'echo' commands in the place it would actually do something. Fiddle with it and make sure if works if you try to use it, then remove the echos. For some primitive sanity checking try: bash -n {script} For debugging the script try: bash -x {script} Once it works chmod it executable. Good luck & hope this is useful, JP ----------------------------|:::======|------------------------------- JP Vossen, CISSP |:::======| http://bashcookbook.com/ My Account, My Opinions |=========| http://www.jpsdomain.org/ ----------------------------|=========|------------------------------- "Microsoft Tax" = the additional hardware & yearly fees for the add-on software required to protect Windows from its own poorly designed and implemented self, while the overhead incidentally flattens Moore's Law. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:

Re: [PLUG] Advice needed on collecting files from FTP site
From: Mike Leone <turgon@mike-leone.com>

[PLUG] More advice on shell scripting and basic admin configurations
From: Mike Leone <turgon@mike-leone.com>

Prev by Date: Re: [PLUG] Purple on Linux

Next by Date: [PLUG] Shell scripting?

Previous by thread: Re: [PLUG] Advice needed on collecting files from FTP site

Next by thread: Re: [PLUG] Advice needed on collecting files from FTP site

Index(es):

Date

Thread