I'm developing on an advanced search for bicycles. I've got a lot of tables I need to join to find all, let's say, red and brown bikes. One bike may come in more then one color! I've made this query for now:
SELECT DISTINCT p.products_id, #simple product id
products_name, #product name
products_attributes_id, #color id
pov.products_options_values_name #color name
FROM products p
LEFT JOIN products_description pd
ON p.products_id = pd.products_id
INNER JOIN products_attributes pa
ON pa.products_id = p.products_id
LEFT JOIN products_options_values pov
ON pov.products_options_values_id = pa.options_values_id
LEFT JOIN products_options_search pos
ON pov.products_options_values_id = pos.products_options_values_id
WHERE pos.products_options_search_id = 4 #code for red
OR pos.products_options_search_id = 5 #code for brown
My first concern is the many joins. The Products
table mainly holds product id and it's image and the Products Description
table holds more descriptive info such as name (and product ID of course).
I then have the Products Options Values
table which holds all the colors and their IDs. Products Options Search
is开发者_StackOverflow社区 containing the color IDs along with a color group ID (products_options_search_id). Red has the color group code 4 (brown is 5).
The products and colors have a many-to-many relationship managed inside Products Attributes
.
So my question is first of all: Is it okay to make so many joins? Is i hurting the performance?
Second: If a bike comes in both red and brown, it'll show up twice even though I use SELECT DISTINCT
. Think this is because of the INNER JOIN
. Is this possible to avoid and do I have to remove the doubles in my PHP code?
Third: Bikes can be double colored (i.e. black and blue). This means that there are two rows for that bike. One where it says the color is black and one where is says its blue. (See second question). But if I replace the OR
in the WHERE
clause it removes both rows, because none of them fulfill the conditions - only the product. What is the workaround for that?
Ok, firstly SQL is made to do joins and your query is not very big. If indexed propertly this should actually increase your performance as it is easier to fetch the correct data.
You can remove the duplicates you are getting by using a sub-query as follows :
SELECT DISTINCT p.products_id, #simple product id
products_name, #product name
products_attributes_id, #color id
pov.products_options_values_name #color name
FROM products p
LEFT JOIN products_description pd
ON p.products_id = pd.products_id
WHERE p.products_id in (
Select products_id from products_attributes pa #This will give you the ID forall bikes that have either red or brown in them
INNER JOIN products_options_values pov
ON pov.products_options_values_id = pa.options_values_id
INNER JOIN products_options_search pos
ON pov.products_options_values_id = pos.products_options_values_id
WHERE pos.products_options_search_id = 4 #code for red
OR pos.products_options_search_id = 5 #code for brown)
Of course you will then not be able to return the color as part of the results as the color does not have a single value. You could combine the colors into a single field by writing a function if you need to do that.
Your option on the 3rd one is to either roll up the data in your PHP code (return items more than once in your result set, but loop through and only show every item once), and then show the list of colors returned in a different way (as an additional table or a comma separated list or whatever suits you.
If you make the OR an AND it of course means all the bikes that are both red AND brown. This would be correct if this is what you are looking for, but it sounds like you want either and not both.
This is not a lot of joins. Assuming decent indexes it will not affect anything negatively.
distinct
does select the distinct combination of all fields in the select
clause. So, yes, if you have more than one color more than one bike will show up since you have included the color field. If you only want one color you should tell it which color you want (like, the one with the maximum code, or something). Or don't select the color code (since you don't appear to care what it actually is anyways). Good rule of thumb: only select the fields that you actually need to.
Your last question is unclear. If you replace the or
with what? If you try to make it and
no it won't work, because no single row has both color codes (since that's impossible).
Definitely not too many joins. Unless the ON
clause is not equality, joins normally restrict the result set nicely, especially when the right indexes are available.
What you want is similar to finding Questions on SO with at least one of a list of tags, so I wrote that query as a comparison: https://data.stackexchange.com/stackoverflow/query/2695/so3005416-comparison-select-questions-with-any-selected-tags
Ignore the code to get the tags into a temporary table, that's really just confirming code from another question's answer. Just compare the two final Select
statements.
As Donnie and Cobusve have pointed out, this isn't a lot of joins. However, in a relational (ie. normalised) schema, product attributes (such as name and colour) would normally be stored on the product table, rather than in separate tables.
Assuming you can't do anything about the table structures, an alternative query might be:
SELECT p.products_id, #simple product id
products_name, #product name
min(products_attributes_id), #lowest color ID
max(products_attributes_id), #highest color ID
min(pov.products_options_values_name), #lowest color name
max(pov.products_options_values_name) #highest color name
FROM products p
LEFT JOIN products_description pd
ON p.products_id = pd.products_id
INNER JOIN products_attributes pa
ON pa.products_id = p.products_id
INNER JOIN products_options_values pov
ON pov.products_options_values_id = pa.options_values_id
INNER JOIN products_options_search pos
ON (pov.products_options_values_id = pos.products_options_values_id AND
pos.products_options_search_id IN (4, 5) ) #codes for red, brown
group by p.products_id, products_name
Depending on which dialect of SQL (SQLServer, Oracle, MySQL etc.) you're using it's possible that the syntax for the final condition might need to be slightly different.
Obviously, where only one colour is returned, the lowest and highest values will be the same.
精彩评论