I have three cases and I don't know what is the better solution for each one, but all are about boolean attributes
I have a table of links and each links has attributes to determine if is visited, broken or filtered and the values of each one must updated one time (except for rare cases of reseting all).
The same links above hava a visiting attribute that is updated constantly, but in a table with more than 1 million of rows, in the maximum, 10,000 or 20,000 will be true.
I have a table with pages and one attribute to indicate if each on开发者_StackOverflow社区e was processed or not. In the end (after processing), all rows must be true.
I want to know what is the better solution for each one of these cases.
I think that is: attribute in the first case, table in the second, and I don't know for the third.
Another solution (like index, maybe) are welcome.
IMPORTANT: both tables (pages and links) can have more than a million of rows.
I would say columns for the first case, tables for the second, and columns for the third.
Your main concern, depending on the scale of your database, might be to separate the often-updated data from the bulk of the rest. That's why I'd suggest a table for the second case. You could, however, make judicious use of the "HOT" feature of PostgreSQL, which means that updates do not cause table bloat if the columns being updated are not indexed. But it's probably still a good idea to keep the traffic away from large tables, because of potentially large seek times, keeping autovacuum happy, etc. If you're concerned, I would test this out.
There is no "best" way. The only way to know if your approach is adequately performant is to do it and see. One approach where there are constant updates will not perform the same where there are large numbers of reads and few updates.
I'd suggest just putting everything in the table, unless you have a reason not to and giving that a whirl.
But most importantly: what DBMS?
精彩评论