I'll need to represent location of some events and I am designing database schema for this application. I have two approaches for presenting the location:
approach 1: 4 tables:
- Countries
- States
- Cities
- Locations (in location I have foreign key to country_id, state_id and city_id)
approach 2: 1 table:
- Locations and there are simply fields country, state, city which are stored as text (no foreign id's)
Which approach开发者_如何学运维 would you recommend? the first one will help to eliminate possible different names of e.g. the same country (usa, us, united states etc.) and could be helpful in providing suggestions when writing in textboxes which probably will be mandatory.
however, the second approach seems like it will allow keep everything much more simple and should reduce number of queries to the database.
Which one do you think is better? Do you know what are best practices in this case? E.g. how did it do some big portals where they also need something like location (e.g. foursquare etc.). Afaik facebook uses the second approach, but... I want to hear your opinions and possibly reasons why would you choose one approach over another.
Thanks!
Approach #1:
This is a good solution if you want a good normalized database. You can manage all your tables easily but you will have to have 3 left/inner join when you query the location. I assume everything is indexed properly so you won't have really trouble with performance since these tables will be relatively small (country and states) and medium size for Cities (if you only want all cities for a specific country only). If you want ALL the cities in the world that table will be huge and you might have performance issue at some point if you don't index or join the table correctly.
Since everything is in the database, you don't have to change code if you need to add, update or delete a record.
If you need to add, update or delete any records, this solution will be very easy to maintain. If you need to update a name (for example city name) and all records will be updated at once.
Queries will be faster to run if you look by city or state will be fast, then a simple left join to get the name will do the trick.
Approach #2:
I personally would not recommend this because for maintainability it is not the best solution. If someday you need to retrieve data based on a city, your query might be slow to execute if you don't index properly. If you index the country, state, city then it will be faster for lookup (but slower than the first approach since varchar is slower than int for indexing). Also, you increase the risk of errors for the names eg: New York VS newyork VS New Yrok.
Also, if you need to update a name of a city, you will have to retreive all the records that has that name then update all these records. Which can take a long time.
eg: UPDATE locations SET city = 'New York' where city = 'newyork'; *note: also if you have misspells, you will have to validate ALL records to make sure you update all records
Here's a skeleton based on your requirement (using MYSQL) for approach #1:
CREATE TABLE `countries` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `states` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL DEFAULT '',
`fk_country_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `fk_country_id` (`fk_country_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `cities` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL DEFAULT '',
`fk_state_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `fk_state_id` (`fk_state_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `locations` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL DEFAULT '',
`fk_country_id` int(10) NOT NULL DEFAULT '0',
`fk_state_id` int(10) NOT NULL DEFAULT '0',
`fk_cities_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `fk_country_id` (`fk_country_id`),
KEY `fk_state_id` (`fk_state_id`),
KEY `fk_cities_id` (`fk_state_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/* This table should not have fk_country_id and fk_state_id since they are already in their respective tables. but for this requirement I will not remove them from the table */
SELECT locations.name AS location, cities.name AS city, states.name AS state, countries.name AS country from locations INNER JOIN cities ON (cities.id = fk_cities_id) INNER JOIN states ON (states.id = locations.fk_state_id) INNER JOIN countries ON (countries.id = locations.fk_country_id);
+-------------------+---------------+----------+---------------+
| location | cty | state | country |
+-------------------+---------------+----------+---------------+
| Statue of Liberty | New York City | New York | United States |
+-------------------+---------------+----------+---------------+
1 row in set (0.00 sec)
EXPLAIN:
+----+-------------+-----------+--------+----------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+----------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | locations | system | fk_country_id,fk_state_id,fk_cities_id | NULL | NULL | NULL | 7174 | |
| 1 | SIMPLE | cities | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | states | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | countries | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-----------+--------+----------------------------------------+---------+---------+-------+------+-------+
Now update:
UPDATE states SET name = 'New York' WHERE ID = 1; //using the primary for update - we only have 1 New York City record in the DB
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Now if I look all my locations for that city, all will say: New York
For approach #2:
CREATE TABLE `locations` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL DEFAULT '',
`fk_country_id` varchar(200) NOT NULL default '',
`fk_state_id` varchar(200) NOT NULL default '',
`fk_cities_id` varchar(200) NOT NULL default '',
PRIMARY KEY (`id`),
KEY `fk_country_id` (`fk_country_id`),
KEY `fk_state_id` (`fk_state_id`),
KEY `fk_cities_id` (`fk_state_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
SELECT location, city, state, country FROM locations;
+-------------------+---------------+----------+---------------+
| location | city | state | country |
+-------------------+---------------+----------+---------------+
| Statue of Liberty | New York City | New York | United States |
+-------------------+---------------+----------+---------------+
Now update:
UPDATE locations SET name = 'New York' WHERE name = 'New York City'; // can't use the primary key for update since they are varchars
Query OK, 0 rows affected (1.29 sec)
Rows matched: 151 Changed: 151 Warnings: 0
Now if I look all my locations for that city, NOT all will say: New York
As you can see, it took 1.29 seconds (yes it's fast) but all records that has "New York" was updated but maybe there's some misspells or bad names etc...
Conclusion: For this reason only, I rather go with the first approach.
Note: Country and States rarely change. Maybe you can have these in your code and don't reference them from the database. This will save 2 INNER JOIN from the query and them in your code you simply retreive the ID of the country or state (same thing if you need to create an HTML drop down box).
Also, you can think about caching these countries and states using like memcached, APC, reddis or any other you prefer.
Go with #1, #2 is not normalized which will can cause problems.
精彩评论