I'm making a website that shows you images. And special feature of site is "don't show me again what I saw before". It means, if you see a image, it goes to your "archive" category. There will be so many images and categories. And I need to very smooth schema of database to perfomance.
When you click a image, it appears on lightbox and in the lightbox code it sends request with ajax to make this image archived just for you.
Is that database schema above performanceful for about 5.000 images and 20.000 users?
users
user_id
user_email
pictures
picture_id
picture_url
tags
archived
user_id
picture_id
images will appear on front of you with excepting archived images for you 开发者_如何学运维from all images on this schema...
This is a diificult question to answer without knowing all the details. You mention how many users and images there will be. How many images will each user (on average) have in their archived list? If that number is small, the archived table won't approach 100M rows.
100M rows should not be a problem by itself, as the database can handle this. The concern may (or will) be with the way you are going to want to query the data. Something like:
SELECT
*
FROM
picture
WHERE
picture_id NOT IN
(
SELECT picture_id FROM archived WHERE user_id = [userIdParameter]
)
That will likely not perform very well with 100M rows.
Another option would be to cross join users and pictures so that the archived table always contains a Cartesian product. So the table would be:
archived
user_id
picture_id
visited
Then you could query like so:
SELECT
p.*
FROM
picture p
INNER JOIN archived a ON p.picture_id = a.picture_id
WHERE
a.user_id = [userIdParameter]
AND a.visited = [false]
This should perform acceptably with proper indexing, but would present the problem of having to make sure rows are created in the archived table any time a user or picture is added to the system. It also means you would always have a number of rows equal to pictures * users (100M in your example). That may not be desirable in your case.
Bottom line, you are going to have to create some test data that approximates your expected volume and do some performance testing that approximates your load. If you think this is the critical potential performance bottleneck for your system, it will be worth the time investment.
I used "NOT IN" solution for a while and there is performance problems started. Because I don't have a strong server to execute that query with lot of datas.
So, I found the most performanceful answer : "Collection Shuffle"
I'm shuffleing the collection with a userid seed and saving just users last image index id. After user comes back, looking to where this user's index id left lastly, showing next id from his collection.
This is really light and exactly solution. Thanks for everyone :)
精彩评论