I have to create a database that will store a huge amount of people along with their addresses, that will have to be searchable by place (i.e. find people in that or other city).
I am doubting whether I should keep the city fie开发者_StackOverflow中文版ld in the address as simple varchar or create a city table and refer to it, to avoid duplicate city names etc.
Note: I am using SQL-Server and will access the data via EF.
Background
Authority table - a reference table that contains the possible values for the "thing" for which it is an authority. For example, a Country authority table would contain all possible values for country.
An Answer
Depending on your definition of "huge amount ..." you definately will want to have some authority tables. Based on the geographic range of your addresses, some or all of these seem like a good start for your authority tables:
- Country - if you allow addresses in more than 1 country, it seems like a good thing to have as an authority table
- State (or province) - nations are broken into provences and/or states. These are a good candidate for an authority table.
- City - cities are (mostly) fixed entities that have a defined name. Another candidate for an anthority table. This should include towns, villages, cities, and other (rural residents may not live in a city, village, town, or anything else).
- Street Name - streets have names (or numbers) that are (mostly) fixed. This is a candidate for an authority table. Street name will depend on a combination of city, state, and country to make it unique. For example City "Blah" may have a street named "Mainne" while city "NotBlah: may not have a street named "Mainne"
- Postal Code (zip code in the USA) is a likely candidate for an authority table. For example, there is no zip code 00001 in the USA.
Doesyour current design have an ADDRESS varchar column that looks like this:?
"101 MAIN STREET, NEW YORK, NY, 10010"
If yes, then you complicate your life if you have to search by street, city, state, zip or combinations.
I'd recommend an ADDRESS table with separate columns for STREET, CITY, STATE, and ZIP. That way you can query for each one individually. Be sure to add an index for each WHERE clause you use.
The next question is whether it's useful to normalize further (e.g. separate tables for CITY, STATE, ZIP) and JOINing to get an address. I'm not sure that it's necessary, but you can try it.
I'm assuming that your model looks something like this:
Address
--------------
Address
City
State
Zip
etc.
If so, then there's no way to eliminate the repeating of some value in the City
column. If you were to create a City
table, the city name would be an obvious choice for a natural key, which would mean that your City
column's actual data would remain unchanged. If you were to use a surrogate key, then you'd simply be repeating that key value instead of the city name. I would not suggest a surrogate key here, though, since a city's name is unlikely to change and you'd be adding an additional level of indirection with no benefit.
In the end, there are a couple of potential scenarios that would warrant a City
table:
- You want to associate address-agnostic metadata with a city record
- You want to enforce referential integrity on this field. I.E. you want to be certain that all values for
City
come from a list of known values, which you'd store in your city list table. This would also allow you to present the user with a list of cities rather than just allowing them to enter the data as free-form text.
If either of those apply to you, then, by all means, create a city table. If not, then there's no need.
精彩评论