I want to scrape a page for all the urls and put them in the dictionary. I created an class with an dictionary. But I can't seem to add elements into it.
type crawler =
new()= {}
member this.urls = new Dictionary<string,string>()
member this.start (url : string)=
let hw = new HtmlWeb()
le开发者_高级运维t doc = hw.Load(url)
let docNode = doc.DocumentNode
let links = docNode.SelectNodes(".//a")
for aLink in links do
let href = aLink.GetAttributeValue("href"," ")
if href.StartsWith("http://") && href.EndsWith(".html") then
this.urls.Add(href, href)
Why is the dictionary urls empty?
because urls here is property that returns new dictionary on every call.
type Crawler() =
let urls = new Dictionary<string,string>()
member this.Urls = urls
member this.Start (url : string)=
let hw = new HtmlWeb()
let doc = hw.Load(url)
let docNode = doc.DocumentNode
let links = docNode.SelectNodes(".//a")
for aLink in links do
let href = aLink.GetAttributeValue("href"," ")
if href.StartsWith("http://") && href.EndsWith(".html") then
urls.Add(href, href)
This wasn't your question, but if you're interested in taking a more functional approach, here's one way to do it:
type Crawler =
{ Urls : Set<string> }
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module Crawler =
[<CompiledName("Start")>]
let start crawler (url:string) =
let { Urls = oldUrls } = crawler
let newUrls =
HtmlWeb().Load(url).DocumentNode.SelectNodes(".//a")
|> Seq.cast<HtmlNode>
|> Seq.choose (fun link ->
match link.GetAttributeValue("href"," ") with
| href when href.StartsWith("http://") && href.EndsWith(".html") -> Some href
| _ -> None)
|> Set.ofSeq
|> Set.union oldUrls
{ crawler with Urls = newUrls }
Your data and behaviors are now separate. Crawler
is an immutable record type. start
accepts a Crawler
and returns a new one with the updated list of urls. I replaced Dictionary
with Set
, since the keys and values are the same; eliminated unused let
bindings, and snuck in some pattern matching. This should have a relatively friendly interface in C# also.
精彩评论