I'm developing a scraper that scrapes one web page for links, then creates threads which execute scraping of subpages.
This is what a thread does:
Dim client As New WebClient()
Dim stream As Stream = client.OpenRead(_Address)
Dim streamReader As New StreamReader(s开发者_如何转开发tream, True)
_Content = streamReader.ReadToEnd()
streamReader.Close()
streamReader.Dispose()
stream.Close()
stream.Dispose()
client.Dispose()
I've noticed that sometimes (usually when there are more simultaneous threads running) a thread throws an exception. It happens randomly, the exception is thrown at client.OpenRead
and it says "Value cannot be null. Parameter name: address"
. I also have a try..catch here so I put a breakpoint in the catch block and it appears that the _Address
is valid at the time, yet the code throws an exception.
_Address
is a protected class field and cannot be accessed by other threads.
The exact message is:
"Value cannot be null. Parameter name: address".
The exception is System.ArgumentNullException
.
Stack trace is:
at System.Net.WebClient.OpenRead(String address) at MyAppName.Scraper.Scrape() in MyAppFolder\Scraper.vb:line 96
Do you have any suggestion on fixing the issue? Thank you in advance.
The internal implementation for WebClient.OpenRead(string address)
is just:
public Stream OpenRead(string address)
{
if (address == null)
{
throw new ArgumentNullException("address");
}
return this.OpenRead(this.GetUri(address));
}
so _Address
must be null when it gets passed in.
Maybe try something like this:
private string _address;
private string _Address
{
get
{
if(_address == null)
throw new ArgumentNullException("_Address was never set and is still null!");
return _address;
}
set
{
if(value == null)
throw new ArgumentNullException("_Address can not be null!");
_address = value;
}
}
So basically if something tries to set _Address to null, you will get an error right when it happens and can see in the call stack where it is being set to null.
精彩评论