Can you please let me know on the quickest and efficient way to compare a large set of values. Its like there are a list of parent codes(string) and each code has a series of child values(string). The child lists have to be compared with each other and find out duplicates and count how many times they repeat.
code1(code1_value1, code1_value2, code3_value3, ..., code1_valueN);
code2(code2_value1, code1_value2, code2_value3, ..., code2_valueN);
code3(code2_value1, code3_value2, code3_value3, ..., code3_valueN);
.
.
.
codeN(codeN_value1, codeN_value2, codeN_value3, ..., codeN_valueN);
The lists are huge say like there are 100 parent codes and each has about 250 values in them. There will not be duplicates within a code list. Doing it in java and the solution i could figure out is.
- Store the values of first set of code in as
codeMap.put(codeValue, duplicateCount)
. The count initialized to 0. - Then compare the rest of the values with this. If its in the map then increment the count otherwise ap开发者_JS百科pend it to the map.
The downfall of this is to get the duplicates. Another iteration needs to be performed on a very large list.
An alternative is to maintain another hashmap for duplicates like duplicateCodeMap.put(codeValue, duplicateCount)
and change the initial hashmap to codeMap.put(codeValue, codeValue)
.
Speed is what is requirement. Hope one of you can help me with it.
You want to use a Map<String,Set<String>>
, e.g. for each child code, what is the set of parent codes that has it.
That is, you want a Multimap, essentially, which is available from Guava.
Here's a sample to illustrate the idea:
import java.util.*;
public class MultiMap {
public static void main(String[] args) {
String[] codes = {
"A=1,2,3,4",
"B=1,3,5,9",
"C=2,5,7,8",
};
Map<String,Set<String>> map = new HashMap<String,Set<String>>();
Set<String> dupes = new HashSet<String>();
for (String code : codes) {
String parent = code.split("=")[0];
for (String child : code.split("=")[1].split(",")) {
Set<String> set = map.get(child);
if (set == null) {
map.put(child, set = new HashSet<String>());
} else {
dupes.add(child);
}
set.add(parent);
}
}
System.out.println(map);
// {3=[A, B], 2=[A, C], 1=[A, B], 7=[C], 5=[B, C], 4=[A], 9=[B], 8=[C]}
for (String child : dupes) {
System.out.println(child + "=" + map.get(child));
}
// 3=[A, B]
// 2=[A, C]
// 1=[A, B]
// 5=[B, C]
}
}
精彩评论