I'm creating some database tables for storing log entries. Under normal circumstances I would always normalize and never stuff values together, but I'm not 100% convinced that's a good idea here.
I could 开发者_开发知识库normalize and have:
- LogEntry
- LogEntryCategory
- LogCategory
- LogEntryProperty
LogEntry has a many to many relationship with LogCategory and a LogEntry has a 1 to many with LogEntryProperty (which are name/value pairs).
Alternative is a denormalized version which has just LogEntry with categories stored as a comma delimited list of string categories and properties stored as a comma limited list of Name: Value formatted properties. As ugly as this sounds, from a reporting, performance, and searchability perspective, I'm not sure if this isn't better.
Which is a better idea?
Thanks.
Because there are only a few distinct properties, I would stay away from name-value pairs and give each property a separate table with a proper name and data-type. I have used generic Property_
, just for the demo.
The thing here is to make sure not to insert a value into a property table if it is missing, in other words all property values are NOT NULL.
To make life easier, define a view
create view dbo.vLogs AS
select
LogCategoryName
, LogTime
, p1_Value
, p2_Value
, p3_Value
, p4_Value
, p5_Value
from LogEntry as e
left join Property_1 as p1 on p1.LogEntryId = e.LogEntryId
left join Property_2 as p2 on p2.LogEntryId = e.LogEntryId
left join Property_3 as p3 on p3.LogEntryId = e.LogEntryId
left join Property_4 as p4 on p4.LogEntryId = e.LogEntryId
left join Property_5 as p5 on p5.LogEntryId = e.LogEntryId
left join LogEntryCategory as x on x.LogEntryId = e.LogEntryId
left join LogCategory as c on c.LogCategoryID = x.LogCategoryID
This view (query) looks complicated and long; however, if you try a query like the one below and look at the execution plan, you may notice that property tables which are not mentioned in the select list are not included in the plan (not touched).
select
LogCategoryName
, LogTime
, p1_Value
, p2_Value
from dbo.vLogs
where LogCategoryName = 'some_category'
and LogTime between from_time and to_time
and if you need something simple like this
select max(p1_Value)
from dbo.vLogs
where LogTime between '2011-07-18' and '2011-07-19'
Here is the execution plan, as you can see only two tables are involved.
This is called table (join) elimination and you do need SQL Server, Oracle, PostgreSql 9.x, ... for this to work -- will not work on MySql (yet).
Each time a property is added, you would have to add a new table and modify the view.
精彩评论