Rich Freeman via plug on 13 Sep 2019 07:12:32 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] large_dir support on ext4

On Fri, Sep 13, 2019 at 9:20 AM Walt Mankowski via plug
<> wrote:
> On Fri, Sep 13, 2019 at 09:00:43AM -0400, Rich Freeman via plug wrote:
> > Now, if you don't actually do name lookups but do something like using
> > readdir to iterate through all the inodes in a directory then it won't
> > provide benefit.  Anything that actually uses a filename to open a
> > file directly would benefit, however, as otherwise the kernel
> > basically has to scan the entire directory to find a file which I'm
> > guessing is O(N).  I don't know enough about ext4 to know if the
> > metadata can get fragmented, but I'm guessing it can, and thus that
> > cost would apply to any seeks needed to read the entire directory
> > listing.
> I don't have any idea what rdiff-backup is doing...

I took a quick look at the source code.  It uses os.walk to get
filenames, and then passes them to a function (or whatever you call
them in python).  So, dir_index will almost certainly benefit you,
though you're already using it.

I found a PEP discussing the os.walk implementation being sub-optimal
due to statting everything, but I have no idea if this was fixed
already.  I can't say I'm familiar with how python is
developed/maintained.  I'm not sure if there is some better solution
in python for iterating without using filenames.  It seems like to
keep things simple they use one function to yield filenames and then
those can be iterated like anything else in python.  I have no idea if
there is an equivalent function that yields inodes instead.  I could
see why they would avoid that for platform-independence.

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --