开发者

Database scalability: Which is more important, size of table or number of queries?

开发者 https://www.devze.com 2023-03-09 14:35 出处:网络
I\'ll take a simplified StackOverflow system as an example. Although limiting some features, it would be possibly to hold Questions and Answers in the same tab开发者_C百科le:

I'll take a simplified StackOverflow system as an example.

Although limiting some features, it would be possibly to hold Questions and Answers in the same tab开发者_C百科le:

(Django-esque pseudo-code)

QA table:
    parent = ForeignKey(self)
    category = ForeignKey(Category)
    title = CharField()
    description = TextField()

Then, to get the Questions and Answers for Question with ID 1, an SQL SELECT would be done for id==1 or parent==1. The downfall would be that the tags and title fields aren't used by Answers

The alternative of course would be two tables:

Questions:
    category = ForeignKey(Category)
    title = CharField()
    description = TextField()

Answers:
    parent = ForeignKey(Questions)
    description = TextField()

Which would require two queries to get the Questions and Answers.

Instinct says the former is a horrible idea but I'm not sure why.

Which is faster and more scalable?


To answer your questions directly, your instinct is correct. Mixing entities (Questions and Answers) together into one table is almost always a bad idea. Logically they are 2 separate entities and physically they should be kept separate.

Your second solution is the correct one. Using indexes and foreign keys to link the 2 tables via the question id would allow you to select all the answers for any of the questions. This would be faster and would scale better in addition to being more understandable to anyone who had to work with the structure in the future.


I don't think there is one good answer here. The best answer, in my humble opinion, is that it depends. For example, if you put questions and answers in two separate tables, you limit yourself to that model. You cannot, for example, have a sub-answer or sub-question in some sort of hierarchy. This may be fine but it may not necessarily suit your environment.

Personally, I try to look at the situation and the data. If I have to store different data about a question compared to an answer (or if I have to use the same column for two different purposes), I instead create two tables. If the data is the same and is going to always be the same, I store it in one table.

Beyond just this limited view of the database schema, however, is a much bigger picture that needs to be considered. For example, what is best for your storage engine? What is best for your hardware? For backups? For archiving? Performance and scalability will depend on a number of factors. This is a good place to start the discussion, but it is just the tip of the iceberg.

0

精彩评论

暂无评论...
验证码 换一张
取 消