I have written a multithreaded crawler and the pro开发者_JS百科cess is simply creating threads and having them access a list of urls to crawl. They then access the urls and parse the html content. All this seems to work fine. Now when I need to write to tables in a database is when I experience issues. I have 2 declared arraylists that will contain the content each thread parse. The first arraylist is simply the rss feed links and the other arraylist contains the different posts. I then use a for each loop to iterate one while sequentially incrementing the other and writing to the database. My problem is that each time a new thread accesses one of the lists the content is changed and this affects the iteration. I tried using nested loops but it did not work before and this works fine using a single thread.I hope this makes sense. Here is my code:
SyncLock dlock
For Each l As String In links
finallinks.Add(l)
Next
End SyncLock
SyncLock dlock
For Each p As String In posts
finalposts.Add(p)
Next
End SyncLock
...
Dim i As Integer = 0
SyncLock dlock
For Each rsslink As String In finallinks
postlink = finalposts.Item(i)
i = i + 1
finallinks and finalposts are the two arraylists. I did not include the rest of the code which shows the threads working but this is the essential part where my error occurs which is basically here
postlink = finalposts.Item(i)
i = i + 1
ERROR: index was out of range. Must be non-negative and less than the size of the collection
Is there an alternative?
looks like collection finallinks
is larger than finalposts
, that's it
See if a ProducerConsumer class will work for you. Your parsing threads will be the producers and your database threads will be the consumers.
If you read the linked page, and try out the code, you should be able to adapt it to your needs.
I believe there is a .NET container called BlockingContainer or something similar that is suitable for Producer-Consumer - patterns, I assume you are working in vb.net.
About he question you ask to Andrey:
You can not (or better, you shouldn't) access the finallinks and finalposts to read and write at the same time so you need to lock because the arraylists instance methods are not safe for multithreading.
So to make it easy, you can not add items to them while reading them to write to a datatable. What you can do is to lock the lists, create a clone of them to write the clone to the datatable, clear the original list and unlock them. This way you have a list to write to DB and another one to be filled by the threads.
I hope this helps.
精彩评论