开发者

What is more important, normalization or ease of coding?

开发者 https://www.devze.com 2023-03-04 11:49 出处:网络
I have an excel spreadsheet i am going to be turning into a DB to mine data and build an interactive app.There are about 20 columns and 80,000 records.Practically all records have about half of their

I have an excel spreadsheet i am going to be turning into a DB to mine data and build an interactive app. There are about 20 columns and 80,000 records. Practically all records have about half of their column data as null, but which column has data is random for each record.

The options would be to:

  1. Create a more normalized DB with a table for each column and use 20 joins to view all data. I would think the benefits would be a DB with really no NULL values so the size would be smaller. One of the major cons would be more code to update each table from the application side.

  2. Create a flat file with one table that has all columns. I figure this will be easier for the applica开发者_JS百科tion side to do updates, but will result in a table that has a butt load of empty dataspace.


I don't get why you think updating a normalized db is harder than a flat table. It's very much the other way around.

Think about inserting a relation between a customer and a product (basically an order). You'd have to:

  1. select the row that describes the rest of the data, but has nulls or something in the product columns
  2. you have to update the product columns
  3. you have to insert a HUGE row to the db

What about the first time? What do you do with the initial nulls? Do you modify your selects to ignore them? What if you want the nulls?

What if you delete the last product? Do you change it into an update and set nulls for just a few columns?


Joins aside, working with a normalized table is trivial by design. You pay for its triviality with performance, that's the actual trade-off.


If you are going to be using a relational database, you should normalize your tables, if nothing else in order to ease data maintenance and ensure you don't have duplicate data.

You can investigate the use of a document database for storage instead of a relational database, though it is not the only option.


Generally normalized databases will end up being easier to write code against as SQl code is deisgned with normalized tables in mind.


Normalizing doesn't have to be done on all columns, so there's a middle ground between the two options you present. A good rule of thumb is that if you have columns that have values being repeated heavily across records, those can be good candidates for normalizing into one or more separate tables. Putting each column in its own table and joining across them is almost certainly overdoing it.


Don't normalize too much. It's hard to maintain a canonical model as your application grows. Storage is cheap. Don't get fooled into coding head aches because of concerns that were valid 20 years ago. No need to go nosql unless you need it.

0

精彩评论

暂无评论...
验证码 换一张
取 消