Database PK-FK design for future-effective-date entries?_问答_开发者

Ultimately I'm going to convert this into a Hibernate/JPA design. But I wanted to start out from purely a database perspective. We have various tables containing data that is future-effective-dated. Take an employee table with the following pseudo-definition:

employee

id INT AUTO_INCREMENT
... data fields ...
effectiveFrom DATE
effectiveTo DATE

employee_reviews

id INT AUTO_INCREMENT
employee_id INT FK employee.id

Very simplistic. But let's say Employee A has id = 1, effectiveFrom = 1/1/2011, effectiveTo = 1/1/2099. That employee is going to be changing jobs in the fu开发者_StackOverflow中文版ture, which would in theory create a new row, id = 2 with effectiveFrom = 7/1/2011, effectiveTo = 1/1/2099, and id = 1's effectiveTo updated to 6/30/2011. But now, my program would have to go through any table that has a FK relationship to employee every night, and update those FK to reference the newly-effective employee entry.

I have seen various postings in both pure SQL and Hibernate forums that I should have a separate employee_versions table, which is where I would have all effective-dated data stored, resulting in the updated pseudo-definition below:

employee

id INT AUTO_INCREMENT

employee_versions

id INT AUTO_INCREMENT
employee_id INT FK employee.id
... data fields ...
effectiveFrom DATE
effectiveTo DATE

employee_reviews

id INT AUTO_INCREMENT
employee_id INT FK employee.id

Then to get any actual data, one would have to actually select from employee_versions with the proper employee_id and date range. This feels rather unnatural to have this secondary "versions" table for each versioned entity.

Anyone have any opinions, suggestions from your own prior work, etc? Like I said, I'm taking this purely from a general SQL design standpoint first before layering in Hibernate on top. Thanks!

That employee is going to be changing jobs in the future, which would in theory create a new (employee) row

Why? What is the point of this? Your employee entity no longer represents an employee, it represents now some abstact concept of "a person in a position".

I believe it would make more sense to separate out the entity that is changing when the employee "changes jobs" - the position - into a separate table, so you do not end up with some messy concept where one physical person is actually multiple employee rows.

I don't understand why you think this seems "unnatural" to have to select from the extra table - you would be separating out something that has multiplicity (a person's position) from something that is singular (an employee).

You need to decide whether you are designing a database to support operations or a data warehouse to support reporting. If it is the second, your design in the beginning is very similar to Kimbal's Type 2 slowly changing dimension. Traditionally, you would want your operational database to represent the most current version of your employee and to provide some business key for it (employee #, SSN, etc.). The data then can be loaded into the data warehouse, where each individual record in the EMPLOYEE dimension would have a surrogate key and effective/end dates. The facts, for example reviews, will be related to the the records in the EMPLOYEE dimension, based on the business key and date/time. For example, you will be able to differentiate reviews of employee A when he was in the Junior Technician position from his reviews when he transitioned to the Senior Engineer position.

For this sort of thing, we usually have a single boolean field called Active, which allows easier querying on the latest record that applies.

To expand on matt b's answer, your discussion of the problem domain makes it pretty clear that what your design is calling out for is a "position" table. An employee's reviews continue to be relevant to that employee even after they move into a new position. Also, in every corporation I've experienced, the concept of an employee's tenure is related to their entire history at the company, not just the current position.

It's generally good practice to look at any complex updates needed as a sign that the design needs to change.

There is the entity that corresponds to the unique human being:

     EMPLOYEE
     eeid PK
     firstname
     surname
     dateofbirth
     dateofhire
     dateoftermination
     etc

and there is the entity that corresponds to the position(s) held by the employee:

     EMPLOYEEPOSITION
     id pk
     eeid FK references EMPLOYEE(eeid)
     title
     reportsto FK references EMPLOYEE(eeid)
     startdate not null usually
     enddate allows null

The question of how to enforce whether an EMPLOYEE's positions can overlap is not typically addressed by creating multiple EMPLOYEE records. Inserts/updates to EMPLOYEEPOSITION typically look at the startdate/enddate columns for each of an EE's positions and, depending on what rule is in effect (e.g. overlaps allowed/disallowed) either commit or rollback the operation.

All of an EE's positions can be found using eeid.

You do not usually put a termination date in the EE record unless and until it's necessary. If the EE is a contract worker, I'd instantiate the contract term as an EMPLOYEEPOSITION.

You can analogize from here for any entity that exists in a many-to-one relationship back to EMPLOYEE.

Lookup temporal databases. Your data is temporal, in spite of the fact that the dates may currently be in the future. Presumably the future oriented data you insert now will still have the same form and meaning when the future change takes effect.