I have the following multithreading function to implement threads fetching from a list of urls to parse content. The code was suggested by a user and I just want to know if this is an efficient way of implementing what I need to do. I am running the code now and 开发者_运维问答getting errors on all functions that worked fine doing single thread. for example now for the list that I use to check visited urls; I am getting the 'argumentoutofrangeexception - capacity was less than the current size'/ Does everything now need to be synchronized?
Dim startwatch As New Stopwatch
Dim elapsedTime As Long = 0
Dim urlCompleteList As String = String.Empty
Dim numThread As Integer = 0
Dim ThreadList As New List(Of Thread)
startwatch.Start()
For Each link In completeList
Dim thread = New Thread(AddressOf processUrl)
thread.Start(link)
ThreadList.Add(thread)
Next
For Each Thread In ThreadList
Thread.Join()
Next
startwatch.Stop()
elapsedTime = startwatch.ElapsedMilliseconds
End Sub
enter code here Public Sub processUrl(ByVal url As String)
'make sure we never visited this before
If Not VisitedPages.Contains(url) Then
**VisitedPages.Add(url)**
Dim startwatch As New Stopwatch
Dim elapsedTime As Long = 0
If the VisitedPages
within processUrl
is shared among the threads, then yes, you need to assure only one thread can access that collection at a time - unless that collection itself is thread safe and takes care of that for you.
Same thing with any other data that that's shared among the threads you create.
I am not seeing where VisitedPages is declared, but I do not see it local to the processUrl method. This would make is shared between all of the threads. This would cause a problem with multiple threads accessing the list/collection at the same time. Which would generate errors similar to what you describe. You will need to protect the VisitedPages collection with a mutex or something to guard against this.
精彩评论