I'm having a funny issue with map_async that i can't figure out.
I'm using python's multiprocessing library with process pools. I'm trying to pass a list 开发者_运维问答of strings to compare against and a list of strings to be compared to a function using map_async()
right now i have:
from multiprocessing import Pool, cpu_count
import functools
dictionary = /a/file/on/my/disk
passin = /another/file/on/my/disk
num_proc = cpu_count()
dictionary = readFiletoList(fdict)
dictionary = sortByLength(dictionary)
words = readFiletoList(passin, 'WINDOWS-1252')
words = sortByLength(words)
result = pool.map_async(functools.partial(mpmine, dictionary=dictionary), [words], 1000)
def readFiletoList(fname, fencode='utf-8'):
linelist = list()
with open(fname, encoding=fencode) as f:
for line in f:
linelist.append(line.strip())
return linelist
def sortByLength(words):
'''Takes an ordered iterable and sorts it based on word length'''
return sorted(words, key=len)
def mpmine(word, dictionary):
'''Takes a tuple of length 2 with it's arguments.
At least dictionary needs to be sorted by word length. If not, whacky results ensue.
'''
results = dict()
for pw in word:
pwlen = len(pw)
pwres = list()
for word in dictionary:
if len(word) > pwlen:
break
if word in pw:
pwres.append(word)
if len(pwres) > 0:
results[pw] = pwres
return results
if __name__ == '__main__':
main()
Both dictionary and words are lists of strings. This results in only one process being used instead of the amount I have set. If i take the square brackets off the variable 'words' it seems to iterate through each string's characters in turn and cause a mess.
What i would like to have happen is it take like 1000 strings out of words and pass them into the worker process and then get the results, because this is a ridiculously parallelisable task.
EDIT: Added more code to make what's going on more clear.
Ok, i actually figured this one out myself. I'm only going to post the answer here for anyone else who might come along and have the same issue. The reason i was having problems was because map_async takes one item from the list (in this case a string), and feeds it into the function, which was expecting a list of strings. so it then was treating each string as a list of chars basically. the corrected code for mpmine is:
def mpmine(word, dictionary):
'''Takes a tuple of length 2 with it's arguments.
At least dictionary needs to be sorted by word length. If not, whacky results ensue.
'''
results = dict()
pw = word
pwlen = len(pw)
pwres = list()
for word in dictionary:
if len(word) > pwlen:
break
if word in pw:
pwres.append(word)
if len(pwres) > 0:
results[pw] = pwres
return results
I hope this helps anyone else facing a similar issue.
精彩评论