Order of files downloaded by a multithreaded program is not constant_问答_开发者

Order of files downloaded by a multithreaded program is not constant

开发者 https://www.devze.com 2023-02-12 09:42 出处：网络

Im usin开发者_开发问答g the program from: here to download many urls at once. It works fine, but the order of the urls in the queue that is received is not the same as their order in the urls list, a

Im usin开发者_开发问答g the program from: here

to download many urls at once. It works fine, but the order of the urls in the queue that is received is not the same as their order in the urls list, and its also not constant (changes from run to run).

What can I do to either make their order constant or to know which url belongs to which index in the queue that is received.

Thanks.

Change fetch to read like this:

def fetch(url):
    return (url, urllib2.urlopen(url).read())

The, instead of a queue full of strings, each one containing a result, you get a queue full of tuples, each tuple containing the url, then a result.

You aren't going to be able to get back a queue in which things are always the same order because multithreading is not deterministic about stuff like that. So the best thing to do is make sure each thing is tagged so you can identify it later.

You can just add the index number to the URL...

urls = [
    (0, 'http://www.google.com/'),
    (1, 'http://www.lycos.com/'),
    (2, 'http://www.bing.com/'),
    (3, 'http://www.altavista.com/'),
    (4, 'http://achewood.com/'),
]

def fetch(index, url):
    data = urllib2.urlopen(url).read()
    # ... do whatever you need using index ...