I have a user table that has many columns, it looks roughly like this:
dname: { type: string(255), notnull: true }
email: { type: string(255), notnull: true, unique: true }
email_code: { type: string(255) }
email_confirme开发者_JAVA百科d: { type: boolean, default: false }
profile_filled: { type: boolean, default: false }
password: { type: string(255), notnull: true }
image_id: { type: integer }
gender: { type: enum, values: [male, female] }
description: { type: string }
dob: { type: date }
height: { type: integer(3) }
looks: { type: enum, values: [thin, average, athletic, heavy] }
looking_for: { type: enum, values: [marriage, dating, friends] }
looking_for_age1: { type: integer }
looking_for_age2: { type: integer }
color_hair: { type: enum, values: [black, brown, blond, red] }
color_eyes: { type: enum, values: [black, brown, blue, green, grey] }
marital_status: { type: enum, values: [single, married, divorced, widowed] }
smokes: { type: enum, values: [no, yes, sometimes] }
drinks: { type: enum, values: [no, yes, sometimes] }
has_children: { type: enum, values: [no, yes] }
wants_children: { type: enum, values: [no, yes] }
education: { type: enum, values: [school, college, university, masters, phd] }
occupation: { type: enum, values: [no, yes] }
country_id: { type: integer }
city_id: { type: integer }
lastlogin_at: { type: timestamp }
deleted_at: { type: timestamp }
I have created a form that contains most of the fields (enums, country , city) which alows the user to generate a where statement based on the fields they selected. So if someone selected smokes: no and country_id: 7 then sql where statement could look like this:
SELECT id
FROM user u
WHERE u.deleted_t IS NULL AND u.profile_filled IS NOT NULL AND smokes = 'no' AND country_id = 7;
Because user could select any combination of fields to filter by, I'm not sure how I should go about indexing this table, should I just create a single column index on all fields that can be filtered? What would you advise?
I have a table at work with the same sort of thing, lots of columns and 1000 different ways to select. Its a nightmare. I did find however, there are certain combinations of filters that are used often. It is those I would create indexes for and leave the others which are rarely used to run slowly. In MSSQL, I can run a query to show me the most expensive queries that have been run against the database, mySQL should have a similar thing. Once I have them, I create an index that covers the columns to speed them up. Eventually, you'll have it 90 percent covered. I personally would never design a table like that again unless I had an AK47 pointed at me. (my indexes are 3 times larger than the data in the table which is very uncool if you need to add a bunch or records). Im not sure how I would redesign the table though, My first thought would be to split the table into two, but that would add to headaches elsewhere.
User Table (UserID, Name)
1, Lisa
2, Jane
3, John
User Attribute Table(UserID, AttributeName,AttributeValue)
1, EYES, Brown
1, GENDER, Female
2, EYES, Blue
2, GENDER, Female
3 EYES, Blue
3, GENDER, Male
This would make identifying attributes faster, but make your queries not as straight forward to write.
SELECT UserID, COUNT(*) as MatchingAttributes
FROM UserAttributes
WHERE (UserAttributes.AttributeName = 'EYES' AND UserAttributes.AttributeValue = 'Blue') OR
(UserAttributes.AttributeName = 'GENDER' AND UserAttributes.AttributeValue = 'Female')
This should return the following
UserID, MatchingAttributes
1, 1
2, 2
3, 1
All you need to do then is add a HAVING COUNT(*) = 2 to the query to select only the IDs that match. Its a bit more involved to select from, but it also gives a neat feature, Say you filter on 10 Attributes, and return all those that have 10 matching. Cool, but say none matched 100%. You could say hey, I found none that matched, but these had 9 out 10 or a 90% match. (just make sure, if I search for a blue eyed blonde female, I don't get a message saying that none where found but here are the next closest matching ones containing blue eyed blonde blokes with a matching score of 60%. That would be very uncool)
There are more things that would need consideration if you chose to split the table, like how do you store attributes as numbers,dates and text in a single column? Or are these separate tables, or columns. No easy answer either way wide table or split tables.
Each and every searchable field needs it own single column index. If your table is big and you do not have an index on the search condition, then every row will have to be scanned.
Adding a new user will be slower, but the way you describe your situation I would imagine your queries are going to mostly be selects with a few inserts.
精彩评论