it's a question for discussion only. Right now, I need to re-design a mysql database table. Basically, this table contains all the contract records I synchronized from another database. The contract record can be modified, deleted or users can add new contract records via GUI interface. At this stage, the table structure is exactly the same as the Contract info (column: serial number, expiry date etc.). In that case, I can only synchronize the whole table (delete all old records, replace with new ones). If I want to delta(only synchronize with modified, new, deleted records) synchronize the table, how should I change th开发者_如何转开发e database schema?
here is the method I come up with, but I need your suggestions because I think it's a common scenario in database applications. 1)introduce a sequence number concept/column: for each sequence, mark the new added records, modified records, deleted records with this sequence number. By recording the last synchronized sequence number, only pass those records with higher sequence number;
2) because deleted contracts can be added back, and the original table has primary key constraints, should I create another table for those deleted records? or add a flag column to indicate if this contract has been deleted?
I hope I explain my question clearly. Anyway, if you know any articles or your own suggestions about this, please let me know. Thanks!
I think you are confused with the concept of a delta.
Either you receive full loads (the entire data set) or only the changes (the "delta").
If you deal with full loads you could do a truncate+insert. That way you won't have to deal with new vs. old rows or deletes. This may not be feasible because of referential integrity constraints etc.
If you receive a delta, each row is typically put into 1 of 2 categories:
- Matching key =
UPDATE
. You can chose to ignore rows with identical data or overwrite. - No matching key =
INSERT
Deletes are special. Rows that doesn't exist cannot be sent to you. Therefore, you need to agree on how you should handle it. In case of full load, you can delete all local rows that does not exist in the received data set.
In case of delta, you could agree to send the row with a delete marker (flag, date). You can then decide whether to keep the row with a delete marker (automatically handled by (1) above), or if you should DELETE
your row. I suggest keeping it, because sooner or later someone will accuse you of missing rows/bad data quality and then you throw the DELETE_DATE in their face.
For MySQL you can use INSERT ... ON DUPLICATE KEY UPDATE to implement "upsert" functionality.
You would have to give more details if you want more specific help.
Update:
Ok, here is an example. Say that you have the following table structure:
create table contracts(
contract_id int not null
,details1 varchar(20)
,details2 varchar(20)
,delete_date date
,primary key(contract_id)
);
Whenever you receive the updated rows, you insert them into a temporary table with identical structure:
create table contracts_delta(
contract_id int not null
,details1 varchar(20)
,details2 varchar(20)
,delete_date date
,primary key(contract_id)
);
Some example data:
mysql> select * from contracts;
+-------------+----------+----------+-------------+
| contract_id | details1 | details2 | delete_date |
+-------------+----------+----------+-------------+
| 1 | a1 | a2 | NULL |
| 2 | b1 | b2 | NULL |
| 3 | c1 | c2 | 2011-01-03 |
+-------------+----------+----------+-------------+
mysql> select * from contracts_delta;
+-------------+----------+----------+-------------+
| contract_id | details1 | details2 | delete_date |
+-------------+----------+----------+-------------+
| 2 | b1 | b2 | 2011-01-03 | <-- Row was deleted
| 3 | c1 | c2 | NULL | <-- No longer deleted
| 4 | d1 | d2 | NULL | <-- This is new row
+-------------+----------+----------+-------------+
Using the syntax I linked to earlier, you can insert all new rows. Whenever the row is already there (on duplicate) we chose to update the columns instead. Note that this handles deleted rows automatically since the delete_date is a regular column like everything else.
insert
into contracts(
contract_id
,details1
,details2
,delete_date
)
select contract_id
,details1
,details2
,delete_date
from contracts_delta s
on duplicate key
update contracts.details1 = s.details1
,contracts.details2 = s.details2
,contracts.delete_date = s.delete_date;
After the "upsert", the data in contracts will look like this:
mysql> select * from contracts;
+-------------+----------+----------+-------------+
| contract_id | details1 | details2 | delete_date |
+-------------+----------+----------+-------------+
| 1 | a1 | a2 | NULL |
| 2 | b1 | b2 | 2011-01-03 |
| 3 | c1 | c2 | NULL |
| 4 | d1 | d2 | NULL |
+-------------+----------+----------+-------------+
-- At this point you can choose to drop the delta table (remember to re-create it next time)
drop table contracts_delta;
-- Or you can just truncate it to save some space. (You need to make sure it's empty on the next load anyway)
truncate table contracts_delta;
-- Or you can save the actual delta (rename the table) incase you need the individual deltas sometime
alter table contracts_delta rename to contracts_delta_20110115;
精彩评论