开发者

Only iterate over a part of a Map

开发者 https://www.devze.com 2023-03-19 10:30 出处:网络
I have data stored in a HashMap, which I want to access via multiple threads simultaneously, to split the work done on the items.

I have data stored in a HashMap, which I want to access via multiple threads simultaneously, to split the work done on the items.

Normally (with a List for example) I would just give each thread an index to start with and could easily split the work like this:

for(int i = startIndex; i < startIndex+batchSize && i < list.size(); i++)
{
    Item a = list.get(i);
    // do stuff with the Item
}

开发者_开发问答Of course this doesnt work with a HashMap, because I can't access it via an index.

Is there an easy way to iterate only over a part of the map? Should I rather use another data structure for this case?

I read about SortedMap, but it has too much overhead I dont need (sorting the items). I have a lot of data and performance is crucial.

Any tips would be highly appreciated.


Firstly, you shouldn't be using a HashMap, because iteration order is undefined. Either use a LinkedHashMap, whose iteration order is the same as insertion order (at least it's defined), or use a TreeMap, whose iteration order is the natural sorting order. I would recommend the LinkedHashMap, because inserting an entry will make slicing the map up unpredictable.

To carve up a map, use this code:

    LinkedHashMap<Integer, String> map = new LinkedHashMap<Integer, String>();

    for (Map.Entry<Integer, String> entry : new ArrayList<Map.Entry<Integer,String>>(map.entrySet()).subList(start, end)) {
        Integer key = entry.getKey();
        String value = entry.getValue();
        // Do something with the entry
    }

I have in-lined the code, but expanded out it is equivalent to:

List<Map.Entry<Integer, String>> entryList = new ArrayList<Map.Entry<Integer,String>>();
entryList.addAll(map.entrySet());
entryList = entryList.subList(start, end); // You provide the start and end index
for (Map.Entry<Integer, String> entry : entryList) ...


If you only do the traversal a few times, or if the map doesn't change you could get a Set of keys, and then send that to an array. From there its pretty much your normal method. But obviously if the HashMap changed then you would have to do those two operations over again which could get very costly.


With HashMap#keySet -> Set#toArray you would get an array of the keys.

With this array you could procede as before, keep the array of keys and pass them to your threads. Then each thread would access only the keys it had been assigned and finally you could access the entries of a given partition of the HashMap with only those keys.


Unless your map is enormous, the cost of iterating over a map is small compared with the cost of starting a task on another thread and trivial compared with the work you intend to do.

For this reason, the simplest way to divide up your work is likely to be turn the Map into an Array and break that up.

final Map<K, V> map =
final ExecutorServices es = 
final int portions = Runtime.getRuntime().availableProcessors();
final Map.Entry<K,V>[] entries = (Map.Entry<K,V>[]) map.entrySet().toArray(new Map.Entry[map.size()]);
final int portionSize = (map.size() + portions-1)/ portions;

for(int i = 0; i < portions; i++) {
    final int start = i * portionSize;
    final int end = Math.min(map.size(), (i + 1) * portionSize);
    es.submit(new Runnable() {
        public void run() {
            for(int j=start; j<end;j++) {
               Map.Entry<K,V> entry = entries[j];
               // process entry.
            }
        }
    });
}
0

精彩评论

暂无评论...
验证码 换一张
取 消