开发者

Which is more efficient : 2 single table queries or 1 join query

开发者 https://www.devze.com 2023-01-18 18:21 出处:网络
Say tableA has 1 row to be returned but will have 100 columns returned while tableB has 100 rows to be returned but only on开发者_高级运维e column from each. TableB has a foreign key for table A.

Say tableA has 1 row to be returned but will have 100 columns returned while tableB has 100 rows to be returned but only on开发者_高级运维e column from each. TableB has a foreign key for table A.

Will a left join of tableA to tableB return 100*100 cells of data while 2 separate queries return 100 + 100 cells of data or 50 times less data or is that a misunderstanding of how it works?

Is it ever more efficient to use many simple queries rather than fewer more complex ones?


First and foremost, I would question a table with 100 columns, and suggest that there is a possibly a better design for your schema. In the real world, this number of columns is less common, so typically the difference in the amount of data returned with one query vs. two becomes less significant. 100 columns in a table is not necessarily bad, just a flag that it shold be considered.

However, assuming your numbers are what they are to make clear the question, there are a few important variables to consider:

1 - What is the speed of the link between the db server and the application server? If it is very slow, then you are probably better off minimizing the amount data returned vs. the number of queries you run. If it is not slow, then you will likely expend more time in the execution of two queries than you would returning the increased payload. Which is better can only be determined by testing in your own environment.

2 - How efficient is the transport protocol itself? Perhaps there is some kind of compression of the data, or an even more clever algorithm that knows column 2 through 101 are duplicate for every row, so it only passes them once. Strategies like this in the transport protocol would mitigate any of your concerns. Again, this is why you need to test in your own envionment to know for sure.

As others have pointed out, you also need to consider what will be done with the data once you get it (e.g., JOINs, GROUPing, etc), but I am limiting my response to the specifics of your question around query count vs. payload size.


What is best at joining? A database engine or client code? Saying that, I use both techniques: it depends on the client and how data will be used.

  • Where the data requires some processing to, say, render on a web page I'd probably split header and details recordsets. We do use this because we have some business logic between DB and HTML

  • Where it's consumed simply and linearly, I'd join in the database to avoid unnecessary processing. For example, simple reports or exports


It depends, if you only take into account the SQL efficiency obviusly several simpler and smaller result queries will be more efficient. But you need to take into account the whole process if the join will be made otherwise on the client or you need to filter results after the join, then probably the DBM will be more efficient that doing it on your code.

Coding is always a tradeoff between diferent systems, DB vs Client, RAM vs CPU... you need to be conscious about this and try to find the perfect solution.

In this case probably 2 queries outperform 1 but that is not a general solution.


I think that your question basically is about database normalization. In general, it is advisable to normalize a database into multiple tables (using primary and foreign keys) and to join them as needed upon queries. This is better for insert/update performance and for keeping the data consistent, and usually results in smaller database sizes as well.

As for the row numbers returned, only a cross join would actually return 100*100 rows; any inner or outer join will not create all combinations, but rather tie together rows on the given conditions, and for outer joins preserve rows which could not be matched. Wikipedia has some samples in its JOIN article.

For very query-intense applications, the performance may be better when using less normlized tables. However, as always with optimizations, I'd only consider going into that direction after seeing real measurable problems (e.g. with a profiling tool).

In general, try to keep the number of roundtrips to the database low; a large number of single simple queries will suffer from the overhead of talking to the DB engine (network etc.). If you need to execute complex series of statements, consider using stored procedures.


Generally fewer queries makes for better performance, as long as the queries return data that is actually related. There is no point in trying to put unrelated data into the same query just to reduce the number or queries.

There are of course exceptions, and your example may be one of them. However, it depends on more than the number of fields returnes, like what the fields actually return, i.e. the actual amount of data.

As an example of how the number of queries affects performance, I can mention a solution that I have (sadly enough) seen many times. In that solution the programmer would first get a number of records from one table, then loop through the records and run another query for each record to get the related records from another table. This clearly results in a lot of queries, and a solution having either one or two queries would be much more efficient.


“Is it ever more efficient to use many simple queries rather than fewer more complex ones?”

The query that requires the least amount of data to traverse, and gives you no more than what you need is the more efficient one. Beyond this, there can be RDBMS specific conditions that can be more efficient on one RDBMS system than another. At the very low level, when you deal with less data, then your results can be retrieved much quicker, so efficient queries are queries that only work with the least amount of data needed to get you the result you are looking for.

0

精彩评论

暂无评论...
验证码 换一张
取 消