JP Vossen on 5 May 2009 12:46:59 -0700

Re: [PLUG] find the user with the most files

> Date: Mon, 4 May 2009 18:49:25 -0400
> From: Michael Lazin <>
> Hi, I'm in a directory containing directories that belong to different users
> and we are trying to get users to reduce their disk use to make our backup
> servers run better.
> I tried
> find . | egrep -i
> "\.(zip|mkv|mp3|avi|rar|exe|iso|wma|wmv|mpg|mpeg|nfo|r[0-9]+)$"|less
> In this directory and noticed a lot of these users have warez, but we don't
> have the time or the manpower to go after every user that has warez.  What I
> want to do is find every directory of a certain name and output the size of
> that directory to see who the biggest culprits are.  Any suggestions?

I agree with Lee's "this is a policy problem, not a technical one" 
answer, but I just can't help answering the technical part anyway. :-)

find and du, and maybe Perl or awk are probably the tools you want.

# Find the total size of each use dir, assuming they live in /home
# du -h is human readable, not suitable for sorting
$ sudo du -hs /home/*

# Similar, but sort, biggest first
$ sudo du -s /home/* | sort -rn

# Top 15 worst user dirs
$ sudo du -s /home/* | sort -rn | head -15

# Find all *dirs* named case-insensitive "foo" or "bar" and give human 
readable size (requires GNU find and xargs)
$ sudo find /home -type d \( -iname '.ssh' -o -iname desktop \) -print0 
| xargs -0 du -hs

# Create a handy list of "big" files that you can then post-process
$ sudo find /home -type f -a -size +2000k -printf "%s\t'%p'\t%u\t%g\n" > 

The find -print0 | xargs -0 use NULL to delimit file names, so that will 
handle things with spaces in them.

find is a great too, but it's quirky as hell and it gets hard to use 
quickly.  I just gave this some thought and if I wanted to get any more 
complicated than what I have above, e.g., find and sum your list of 
files for each user, I think I'd just use a find command to create a 
flat file list, then use Perl or awk to process that list.

'man find' and look for -print0, -ls, -printf (e.g. 
"%s\t'%p'\t%u\t%g\n"), -iname, -o (or) -a (and), etc.  Lots there.  Not 
the easiest to read or use.

Good luck,
