I'm currently working on a system that in some cases will need to run on a local database during the day, and then replicated to a central server during the night. It cannot all run from one central database as the local sites are out of contact with it periodically. The data at the central server is for viewing and reporting only at head office, so nothing needs to be reverse replicated back to the site.
Each "site" is given a text based unique key (human generated). However, the thought of making every table in the database design reference the site key is not appealing.
Here's an example of a very cut back version of the schema without worrying about remote replication (which will work fine for the majority of clients) : -
(I'll only show the history table for the Area table, to keep things short) :
[Site]
SiteKey [PK] (Gauranteed 100% unique across all sites text based key)
[User]
SiteKey [FK -> Site]
UserID [PK]
[Area]
SiteKey [FK -> Site]
AreaID [PK]
Description
UpdatedDtm
UpdatedUserID [FK -> User]
[AreaHistory]
Site [FK -> Site]
AreaID [FK -> Area]
Description
UpdatedDtm
UpdatedUserID [FK -> User]
AuditedDtm
[Location]
AreaID [FK -> Area]
LocationID [PK]
Description
UpdatedDtm
UpdatedUserID [FK -> User]
[Sensor]
LocationID [PK / FK -> Location]
SensorNo [PK]
UpdatedDtm
UpdatedUserID [FK -> User]
[Reading]
LocationID [PK / FK -> Sensor]
SensorNo [PK / FK -> Sensor]
ReadingDtm [PK]
Which is fine, until I come to "merge" the database with the database at the central server. I'm obviously going to get clashes in the Location table because I'm mixing data with ID's generated at other sites.
The first way I thought around this problem was to do this:
gs short) :
[Location]
SiteKey [FK -> Location, FK -> User] ** ADDED THIS
AreaID [FK -> Area]
LocationID [PK]
Description
UpdatedDtm
UpdatedUserID [FK -> User]
[Sensor]
SiteKey [FK -> Location, FK -> User] ** ADDED THIS
LocationID [PK / FK -> Location]
SensorNo [PK]
UpdatedDtm
UpdatedUserID [FK -> User]
[Reading]
SiteKey [FK -> Sensor] ** ADDED THIS
LocationID [PK / FK -> Sensor]
SensorNo [PK / FK -> Sensor]
ReadingDtm [PK]
Basically, every table gets a SiteKey making each row unique to the site.
An alternative is this (using UUIDs in some places) : -
[User]
SiteKey [FK -> Site]
UserUUID [PK]
[Area]
SiteKey [FK -> Site]
AreaUUID [PK]
Description
UpdatedDtm
UpdatedUserUUID [FK -> User]
[AreaHistory]
Site [FK -> Site]
AreaUUID [FK -> Area]
Description
UpdatedDtm
UpdatedUserUUID [FK -> User]
AuditedDtm
[Location]
AreaUUID [FK -> Area]
LocationUUID [PK]
Description
UpdatedDtm
UpdatedUserUUID [FK -> User]
[Sensor]
LocationUUID [PK / FK -> Location]
SensorNo [PK]
UpdatedDtm
UpdatedUserUUID [FK -> User]
[Reading]
LocationUUID [PK / FK -> Sensor]
SensorNo [PK / FK -> Sensor]
ReadingDtm [PK]
Remember this is cut down, but it illustrates the problem.
Are there any alternatives I may be missing? I thought about ID remapping but that seemed to introduce new even worse nightmares.
The annoying part of this is that it's a very small number of cases that use this scenario, the rest are happy with one database serving many sites. However, the client wanting this开发者_如何学JAVA scenario is the biggest ;-)
I thought maybe I could use UUID just for them, but that means creating exceptions everywhere so may as well just use them throughout (where neccessary) if I'm going to use them.
If it makes any difference, I'm using PostgreSQL.
PS, this discussion may be relevant to this case. I just keep wondering if there's another way considering I have a nice gauranteed unique SiteKey.
I've more or less come to the conclusion that IMHO, for this scenario, using UUIDs is a bit of a "quick fix", maybe even a bit of a hack. I've decided that for me, in this case, using a composite key is cleaner. Using a UUID, I may as well of just used preprended the SiteKey to each ID.
Most replication scenarios require the use of a GUID/UUID. I would add one to every table. Read up about the performance implications of using one and how to best avoid them in your particular database backend.
I'll also add another set of out-of-the-box answers you may want to at least think about:
If you want to keep the site as part of a composite key in the underlying data - with code generation, views and updatable triggers (at least on SQL Server), it can be possible to make updatable views which don't have the site key exposed. So you can hide that from the application layer (this assumes the app is not going to be site-aware).
Don't really replicate, but consolidate the data into a reporting database and have the site ONLY at the reporting layer. Sure your schema is different, but it's for reporting, so does it even need the same RI?
Not only does your reporting database not have to have the same schema, it can have a drastically different denormalized schema like a data warehouse which is optimized for specialized reporting needs.
Given what you have said, I would use a GUID (and more biased towards that as the number of tables increases). Note that in SQL Server, because it's good to have the clustered index on a narrow increasing key, and often this is also the primary key, a generic GUID is not as effective, since it causes issues related to the non-increasing nature. There is a NEWSEQUENTIALID() which only generates sequential GUIDs:
INT vs Unique-Identifier for ID field in database
I've use a COMB ID technique before which also had the ability for a datetime to be extracted from the GUID, giving you a creation timestamp for free (although not necessarily free, since it's not really indexable by itself) (since you are already taking extra bytes for GUID vs. int)
I assume some or most of that logic applies to postgressql, but as far as specifics of its indexing or clustered/heap implementation, I don't have direct experience.
精彩评论