Are there any elegant DB model design patterns for identifying the most recent child in a one to many relationship?
Given a vanilla parent/child model, you can accomplish via TSQL. However, I wanted to find an efficient method that does not require looking through many rows or index ordering.
This example design stores the most recent child ID in the parent. It seems to be a poor design since you could have contention for updating "Top Child ID".
Parent Table
开发者_如何转开发-Parent ID
-Top Child ID
Child Table
-Child ID
-Parent ID
Does anyone have a clever or elegant solution?
There are basically two dimensions to this problem. One, the source of the "most recent child" fact:
A: fact is determined from some database attribute ("latest_modified timestamp = most recent)
B: fact is determined from event involving child ("last displayed on page = most recent)
Two, the way in which this fact is retrieved over time:
C: Fact is stored as an attribute and retrieved by query
D: Fact is calculated dynamically
Breaking it down like that exposes a couple of (possibly not-obvious) things:
1) B is not persistent across sessions. If you try to persist B, then you have turned it into A. On the other hand, B type solutions lend themselves to memory caching techniques. If you don't mind the "most recent child" fact being transient, this may be the way to go. Store the most recent in memory.
2) If you need the fact to persist, you're stuck with A. Some degree of contention is unavoidable, because you are updating a database structure with a rapidly changing value.
3) Once you go with A, you have to choose between C & D. This choice should be determined by the expected ratio of reads to writes. If your app is read heavy (children change hourly, recency is checked 10x/sec), then calculate once and store in the parent table. If your app is write heavy (children change 10x/sec, recency is checked hourly), then calculate dynamically, i.e. only when you need to know the "most recent".
Note that since you can determine "most recent child" dynamically, storing it in the parent table is redundant. Since denormalization is usually only justified for strong performance gains, unless you ARE very read-heavy, I'd calculate dynamically. You can always refactor later.
I don't know if this would give you an elegant solution, but it will give you the "right" one for your situation, which is arguably not a bad substitute.
The elegant solution is based on only one table. Write the TSQL to identify the "most recent child" according to your business rules. Two tables violates normalization rules and causes addition IO (IMHO).
精彩评论