开发者

Remove linebreak node in htmlagilitypack?

开发者 https://www.devze.com 2023-01-15 22:36 出处:网络
im trying to retrieve this text on a webpage without the line break: <span class=\"listingTitle\">888-I-AM-JUNK. Canada\'s most trusted BIG LOAD junk removal<br />specialist!</span>

im trying to retrieve this text on a webpage without the line break:

<span class="listingTitle">888-I-AM-JUNK. Canada's most trusted BIG LOAD junk removal<br />specialist!</span></a>

How can I do it?

Here is my current code so far, im using vb.

Dim content As String = ""
        开发者_开发知识库Dim doc As New HtmlAgilityPack.HtmlDocument()
        doc.Load(WebBrowser1.DocumentStream)
        Dim hnc As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//span[@class='listingTitle']")
        For Each link As HtmlAgilityPack.HtmlNode In hnc
            Dim replaceUnwanted As String = ""
            replaceUnwanted = link.InnerText.Replace("&amp;", "&") '
            replaceUnwanted = replaceUnwanted.Replace("&#39;", "'")
            replaceUnwanted = replaceUnwanted.Replace("See full business details", "")

            content &= replaceUnwanted & vbNewLine
        Next
        RichTextBox1.Text = content
        Me.RichTextBox1.Lines = Me.RichTextBox1.Text.Split(New Char() {ControlChars.Lf}, _
                                                   StringSplitOptions.RemoveEmptyEntries)

I need to remove the <br />


How about going through the same regular string manipulation?

replaceUnwanted = replaceUnwanted.Replace(vbCrLf, "")

If you were dealing with the <span>...<span>:

replaceUnwanted = replaceUnwanted.ToLower().Replace("<br>", "")
replaceUnwanted = replaceUnwanted.ToLower().Replace("<br />", "")
0

精彩评论

暂无评论...
验证码 换一张
取 消