I'm about to create a user based website and will have to store photo, docs and other data for each user.
If I take a silly number like 1 000 000 000 users, I believe than one folder with 1 000 000 000 won't be the fastest thing in the world! So I was thinking of creating something like
1st level : [a-z] 2nd level : [a-z] 3rd level : [a-z]
Therefor bobby will be in /b/o/b/by
But this also mean that it won't be spread equaly, because there will be very few user starting with a z and many more with a m,s,l ...
so I was thinking of using a user id such as "000000000001", "000000000001" etc...
1st level : [000-999] 2nd level : [000-999] 3rd level : [000-999]
therefore data of the user 000000000001 will be store in /data/000/000/000/001 then I will be sure to have a maximum of 1000 folder in each level.
What do yo开发者_高级运维u guys think about it, what I should do or not do ?
The server will be running Centos 5.4 with EXT3 on raid 1, if the I/O get's too bad i will probably go for a raid 10.
A hash function provides a way to distribute large amounts of data across an easily searchable structure.
See this related question: Why use hashing to create pathnames for large collections of files?
And also try looking through Google results for Directory Hashing.
精彩评论