I have a file stored in a hash with a value being an array. I would like to compare the arrays with each other and if they match exactly then store them in an array.
For example:
@geno1 = NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A,NN,NN,NN,NN,NN,NN,NN,A,NN,NN,NN,NN,NN,NN,T,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A,G,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,G,NN,NN,NN,NN,G,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN
@geno2 = NN,NN,NN开发者_开发技巧,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A,NN,NN,NN,NN,NN,NN,NN,A,NN,NN,NN,NN,NN,NN,T,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A,G,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,G,NN,NN,NN,NN,G,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN
In this this matches exactly, Then I would like to store them in an array. If they are different even by one element. Say,
@geno2 = NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A,NN,NN,NN,NN,NN,NN,NN,A,NN,NN,NN,NN,NN,NN,T,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A,G,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,G,NN,NN,NN,NN,G,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,NN,A
In this the last element is A. So that shouldn't be stored in the array. If there is a way to do this without lopping through the array would be great. Since I have to run this on 10k samples and more frequently.
As far as I can tell, checking that two arrays of length n are equals has O(n) complexity, so you have to go through all elements. Of course you can break the loop as soon as you find a difference, but if the difference is at the end, then you still have to look at n elements!
I agree with MarcoS that in the general case you need to check everything. However, there are specific cases where either your definition of "equal" or the type of matching you are doing can be optimized.
Specifically, you have many repeated elements in the array. Does order matter? Could you condense this into a hash table and if two arrays had 23NN, 3 As, 2 Gs, and 1 T they could be considered equivalent?
Are you going to be matching against the same arrays over and over? If so, you could perhaps hash (ala md5, sha) the arrays and assume that if the two hashes match then the two arrays match (this would of course require benchmarking to ensure that it actually sped things up).
精彩评论