I'm doing a data migration at the moment and one task I'm doing is matching primary keys from the old DB to the new one for each table. There's probably about 40 tables in the whole migration and as I'm only moving some of the data across, I want to prevent creating duplicate new records in the new DB.
So I want to store multiple collections of pairs of integers (oldPK and newPK). Each collection represents a table and each pair represents a row I've already migrated across. I am frequently going to be searching on the oldPK in order to see if I've already migrated a particular row in a table.
I'm unsure about how many pairs of integers I may have, although I am sure that it will not exceed the number of rows in the old DB's table, which would typically be from 100 to about 5000. (I could ent开发者_开发知识库ertain the idea of different collections having different data structures)
Also, I will not be populating the list all at once, it is likely to occur one integer pair at a time, typically when I've written that record to the new DB.
I've tried using a List - T being a class that has integers A and B, but it seems to slow down as the collection gets very big.
Is there a better data structure I could use for this scenario?
CONCLUSION
ok so I just did a test with all different data types (eg. hashset, list, dictionary, SortedDict, SortedSet, SortedList, Hashtables).
Hashtable came out hands down the fastest. We're talking calculations which took the other data structures 5-10 seconds, it would take less 0.1 seconds!
Use a hashtable. This has very quick lookup to see if a particular key - for example your old ID - is in it. It will not slow down appreciably even if you have gazillions of rows in it.
精彩评论