开发者

Why is the member dictionary in this f# code always empty?

开发者 https://www.devze.com 2023-03-23 16:43 出处:网络
I want to scrape a page for all the urls and put them in the dictionary. I created an class with an dictionary. But I can\'t seem to add elements into it.

I want to scrape a page for all the urls and put them in the dictionary. I created an class with an dictionary. But I can't seem to add elements into it.

type crawler =

     new()= {}
     member this.urls  = new Dictionary<string,string>()
     member this.start (url : string)=
        let hw = new HtmlWeb()
        le开发者_高级运维t doc = hw.Load(url)
        let docNode = doc.DocumentNode
        let links = docNode.SelectNodes(".//a")

        for aLink in links do
            let href = aLink.GetAttributeValue("href"," ")
            if href.StartsWith("http://")  && href.EndsWith(".html") then
              this.urls.Add(href, href)

Why is the dictionary urls empty?


because urls here is property that returns new dictionary on every call.

type Crawler() =  
    let urls = new Dictionary<string,string>()
    member this.Urls  = urls
    member this.Start (url : string)=        
        let hw = new HtmlWeb()        
        let doc = hw.Load(url)        
        let docNode = doc.DocumentNode        
        let links = docNode.SelectNodes(".//a")        
        for aLink in links do            
            let href = aLink.GetAttributeValue("href"," ")            
            if href.StartsWith("http://")  && href.EndsWith(".html") then              
                urls.Add(href, href)


This wasn't your question, but if you're interested in taking a more functional approach, here's one way to do it:

type Crawler = 
  { Urls : Set<string> }

[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module Crawler =

  [<CompiledName("Start")>]
  let start crawler (url:string) = 
    let { Urls = oldUrls } = crawler
    let newUrls =
      HtmlWeb().Load(url).DocumentNode.SelectNodes(".//a")
      |> Seq.cast<HtmlNode>
      |> Seq.choose (fun link ->
        match link.GetAttributeValue("href"," ") with
        | href when href.StartsWith("http://") && href.EndsWith(".html") -> Some href
        | _ -> None)
      |> Set.ofSeq
      |> Set.union oldUrls
    { crawler with Urls = newUrls }

Your data and behaviors are now separate. Crawler is an immutable record type. start accepts a Crawler and returns a new one with the updated list of urls. I replaced Dictionary with Set, since the keys and values are the same; eliminated unused let bindings, and snuck in some pattern matching. This should have a relatively friendly interface in C# also.

0

精彩评论

暂无评论...
验证码 换一张
取 消