开发者

Why sort automatically-generated files based on hash?

开发者 https://www.devze.com 2023-01-08 12:44 出处:网络
It\'s a pattern I\'ve seen on websites that allow users to upload content like images before. For ex开发者_运维百科ample, why http://upload.wikimedia.org/wikipedia/commons/7/70/Example.png instead of

It's a pattern I've seen on websites that allow users to upload content like images before.

For ex开发者_运维百科ample, why http://upload.wikimedia.org/wikipedia/commons/7/70/Example.png instead of just something like http://upload.wikimedia.org/wikipedia/commons/Example.png?

Is there a practical reason for this, or is it just cargo-cult?


Many filesystems don't perform very well when there are hundreds of thousands of files in the same directory - it takes a long time to look in the directory for a file.

To avoid this problem, the files are distributed into a folder hierarchy. In order to get an even distribution, you hash the filename or contents - something that identifies the file - and use parts of that hash to determine what folder the file should be placed in. That's where the 7/70 comes from: it's derived from the prefix of the hash in two steps, creating a two-level hierarchy. Files are therefore distributed over 256 different folders, meaning you have much fewer files in each folder, which in turn gives better filesystem performance.


There are two obvious reasons:

  • To avoid loading too many files into a single directory
  • It makes it easy to avoid collisions of filenames, without renaming the original file
0

精彩评论

暂无评论...
验证码 换一张
取 消