I need to add a few million data records to an ESE database. Among other values, each record has a unique string value. This value can be thought of as a key.
Interesting to the records is that there may be multiple identical instances of the same record within the input set. 开发者_运维百科Once entered I only want one record with each of the unique strings.
My question is how to do this - how can I quickly filter out duplicates?
Right now I'm adding each record only after doing a search for the key, if the entry already exists I skip it. If it's not in the database I add the record and progress. The big cost here is doing the search on each entry.
any ideas on making this very fast? is there anyway to key the value such that adding a duplicate would fail?
Your can just create a unique index on the string column by passing JET_bitIndexUnique into JetCreateIndex:
JetCreateIndex(sesid, tableid, "myindex", JET_bitIndexUnique, "+string_col\0", 13, 100));
An insertion of a duplicate value with fail with JET_errKeyDuplicate.
This approach is best if your strings are short. If your strings are long you should use a hash of the string to test for uniqueness.
**
DTS.
BULK INSERT.
SSIS.
Choose as you wish
**
精彩评论