I am working on time series data, for which the key column is a timestamp : Time. There are also many "value" columns for each row.
I am about to shift a whole range of my data by several hours (due to a daylight saving time issue). 开发者_运维技巧For that, I will update the key of several rows, and it might result in some duplicate keys. I would like the duplicate keys on the edge of the date range to be ignore. I want the shifted range to override the old one.
I plan to do something like :
UPDATE IGNORE time_series_table
SET time=time-<some_shift>
WHERE <time in a date-range>
Here is the output of describe <table>
for the time key :
Field Type Null Key Default Extra
TimeMeas datetime NO PRI NULL
My question is : Will it shift all the keys at once, or will it try to shift each row one by one, resulting in massive duplicate keys wihthin the shifted range itself ?
Do you have a better way of doing this in mind ? Thanks in advance
Will it shift all the keys at once, or will it try to shift each row one by one
It will shift all the keys at once.
resulting in massive duplicate keys wihthin the shifted range itself ?
It just failed if any of primary key is duplicated.
With update ignore
, it just skip silently.
This is my approach to fix this
/* create a temporary table to store matches records*/
create table tmp_table select time-<some_shift>, etc_cols....
from time_series_table
where <time in a date-range>;
then
/* delete the matches in the original table */
delete from time_series_table where <time in a date-range>;
delete from time_series_table where <time in a date-range - some_shift>;
finally
/* at this point, there won't be any duplicate data */
/* so, insert back into original table */
insert into time_series_table select * from tmp_table;
optmize table time_series_table;
精彩评论