subscribe

ext3: too many links!

Apparently when using ext2 or ext3 there's a limit on the number of subdirectories you can create within a directory. This is a hardcoded number and seems to be set to about 215 ~= 32k.

This is related to the maximum number of (hard/soft?)links that can created, as every subdirectory needs a link back to their parent (..). If you run into a "too many links" error with really large folders, you'll know why.. The only way you can change this is by changing the number and re-compiling, which is not worth doing IMHO.

I thought I read somewhere the number of files in one directory is by default around 64k, so I just figured it would be the same for subdirectories.. Guess I'll need to re-organize a bit =)

Web mentions

Comments

  • topbit

    If you're reading files out of that directory, you'll be wasting time searching for files among the 60-odd thousand before you hit that wall. Last year, I had inherited a system that was spending 10-15% of it's time just searching the big directory - making it into a hierarchy of up to 100x100x100 (based on the last 6 characters of the filename, which were fairly random) meant that it went from searching tens of thousands of directory entries to a maximum of 100+100+100+100 - three levels of subdir + the actual file. On average, it was less than half that number per level searched as well. Which is still a lot faster than searching half of 50,000+
  • Topbit

    Finding which filename shouldn\'t be the problem - the disk IO and wait-times comes from having to scan the directories for the actual file information.
    You\'ve still got a potential 16,384 files in each directory,
    but are still searching the directory list though - up to an average 8192 (with 16,000 files, half the files you look for in the first half of the directory, half then the back-half), iterating through those directory lists. Ext2/3 doesn\'t index them any further for speed (unlike, say ReiserFS with it\'s b+tree).
    If you expect a large number of files (say, well over 100K files), it might be worth going to another level of directories to reduce the actual number of of directory entries searched.
  • Evert

    Evert

    Hi TopBit,

    I have to deal with huge amounts of files.. I chose a 2 level deep directory structure.. I never have to actually search for files, I can find the folder + filename using a hash function.

    I used to use md5 on a file id and then use the first 4 characters for the top-level directory, the second 4 chars for the second level.. but 4 hex chars means 2^16 directories.. So to be safe I went back to 2^14 possibilities..

    Two levels deep that gives me 2^28 folders which could theoretically contain 64k files.. but yea.. I\'ll be running out of inodes or bytes before I hit that wall =)
  • Evert

    Evert

    So my actual searching is in the database.. all the files are media files so I don\'t have to actually go into the file to search for something or scan any directories.
  • Evert

    Evert

    Ah so if I get this right you are saying its better to have more levels in your directory structure and less entries per dir ?

    K, thats useful information.. Thnx
  • Evert

    Evert

    something is seriously screwed up with the clock on the server.. all the messages are out of order...

    oh well
  • JPW

    JPW

    I use the first 4 digits of a hash in base 32.

    1024 top directories each having 1024 sub directories. So files get distributed in 1M (1048576) of directories.
  • Topbit

    Topbit

    Sometimes, just get the best from the very big (large systems), you have to understand the very small first (how an OS organises its filesystem).
  • sapphirecat

    sapphirecat

    In a quick un-scientific test, ext3 won't create more than 31997 directories in ~/foo for me.

    For files, I killed the test after it created 1,065,934 files, so whatever the limit there is, it's at least more than 2^20.
  • Evert

    Evert

    Yea I got about the same.. But like topbit mentioned its a lot better to have many small directories.
  • ben

    Hey there . I have a simular problem. how much files per directory does make sense (ext3) ? im talking about 60k of media files which should be distributed in a folder hirarchy
  • Evert

    Evert

    I'd say go for a few hundred like topbit said. It's the most efficient.
  • skqouktamd

    skqouktamd

    Hello! Good Site! Thanks you! riwhzlewttjkd
  • Evert

    Evert

    Alex, I know! It sucks :P

    need to find time to make it even better :)
  • Lampdocs

    Lampdocs

    So what could be the solution? Should I recompile the kernel?
  • Evert

    Evert

    If you're hitting this limit with ext3, the only real solution is to change your application to NOT hit this limit.

    There are also b-tree based filesystems that might not have these issue. Regardless, avoid it in general.