Rich Freeman on 7 Aug 2012 08:18:08 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] UNIX File Equivalence


Here is a good general UNIX question that somebody might know the answer to.

On UNIX, or at least on Linux, how can I determine if two files are
the same, preferably very quickly (ie via a system/library call and
not a page of PERL - though I'd still be interested in scripting/bash
solutions as well).

By the same, I don't mean that their contents are the same - I mean
that the files are actually the same file.

For example, on my system right now the files /usr/tmp/xyz and
/var/tmp/xyz are actually the same file (due to symlinks).  Other
mechanisms that could cause files to be the same would be hard links
or bind mounts - either at the file level or anywhere in the path to
them.  I would consider reflinked files, copies of files, or snapshots
of files to be different, even if at the moment their contents happen
to be the same.

Ideally this should not depend on filesystem-specific details, though
something that requires simply that a filesystem comply with some
accepted standard is fine (it should work on the big linux ones -
ext(2/3/4)/xfs/btrfs/zfs/etc).

My use case is a C function which takes as input a file and needs to
determine if that file is one of the files in some list, regardless of
how the path takes it there.  So, solutions that can also come up with
a deterministic canonical path for a file (even with bind mounts) or
some kind of unique hash for a file would be even better (without
reading the file - again I don't care about content being the same -
just the file being the same, and reading content is slow anyway).
I'd expect this function to be called EXTREMELY often so it should run
on the order of milliseconds with a search list containing tens of
thousands of path/filenames.  If I had to exhaustively fully test the
whole list I could probably add files that match to the list so that
future searches would go faster (with just a direct path/name hit).

Part of me is thinking there is some system call that should just do
this, I just don't know what it is.

Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug