开发者

Efficient MapReduce when dealing with streams to queries to the same dataset

开发者 https://www.devze.com 2022-12-18 23:08 出处：网络

I have a massive, static dataset and I\'ve a function to apply to it. f is in the form reduce(map(f, dataset)), so I would use the MapReduce s开发者_开发知识库keleton. However, I don\'t want to scat

相关专题：distributed-computing mapreduce parallel-processing

I have a massive, static dataset and I've a function to apply to it.

f is in the form reduce(map(f, dataset)), so I would use the MapReduce s开发者_开发知识库keleton. However, I don't want to scatter the data at each request (and ideally I want to take advantage of indexing in order to speedup f). There is a MapReduce implementation that address this general case?

I've taken a look at IterativeMapReduce and maybe it does the job, but seems to address a slightly different case, and the code isn't available yet.

Hadoop's MapReduce (and all the others map-reduce skeleton inspired by Google) doesn't scatter the data all the time.