开发者

RegEx to ignore / skip everything in html tags

开发者 https://www.devze.com 2022-12-27 17:22 出处:网络
Looking for a way to combine two Regular Expressions. One to catch the urls and the other to ensure is skips text within html tags. See sample text below functions.

Looking for a way to combine two Regular Expressions. One to catch the urls and the other to ensure is skips text within html tags. See sample text below functions.

Need to pass a block of news text and format text by wrapping urls and email addresses in html tags so users don't have to. The below code works great until there are already html tags within the text. In that case it doubles the html tags.

There are plenty of examples to strip html, but I want to just ignore it since the url is already linkified. Also - if there is an easier was to accomplish this, with or without Regex, please let me know. none of my attempts to combine Regexs have worked.

coding in ASP.NET VB but will take any workable example/direction.

Thanks!

===== Functions =============

Public Shared Function InsertHyperlinks(ByVal inText As String) As String
    Dim strBuf As String
    Dim objMatches As Object
    Dim iStart, iEnd As Integer
    strBuf = ""
    iStart = 1
    iEnd = 1

    Dim strRegUrlEmail As String = "\b(www|http|\S+@)\S+\b"             
    'RegEx to find urls and email addresses
    Dim objRegExp As New Regex(strRegUrlEmail, RegexOptions.IgnoreCase) 
    'Match URLs and emails        
    Dim MatchList As MatchCollection = objRegExp.Matches(inText)
    If MatchList.Count <> 0 Then

        objMatches = objRegExp.Matches(inText)
        For Each Match In MatchList
            iEnd = Match.Index
            strBuf = strBuf & Mid(inText, iStart, iEnd - iStart + 1)
            If InStr(1, Match.Value, "@") Then
                strBuf = strBuf & HrefGet(Match.Value, "EMAIL", "_BLANK")
            Else
                strBuf = strBuf & HrefGet(Match.Value, "WEB", "_BLANK")
            End If
            iSta开发者_C百科rt = iEnd + Match.Length + 1
        Next
        strBuf = strBuf & Mid(inText, iStart)
        InsertHyperlinks = strBuf
    Else
        'No hyperlinks to replace
        InsertHyperlinks = inText
    End If

End Function

Shared Function HrefGet(ByVal url As String, ByVal urlType As String, ByVal Target As String) As String
    Dim strBuf As String
    strBuf = "<a href="""
    If UCase(urlType) = "WEB" Then
        If LCase(Left(url, 3)) = "www" Then
            strBuf = "<a href=""http://" & url & """ Target=""" & _
                     Target & """>" & url & "</a>"
        Else
            strBuf = "<a href=""" & url & """ Target=""" & _
                    Target & """>" & url & "</a>"
        End If
    ElseIf UCase(urlType) = "EMAIL" Then
        strBuf = "<a href=""mailto:" & url & """ Target=""" & _
                 Target & """>" & url & "</a>"
    End If
    HrefGet = strBuf
End Function

===== Sample Text =============

This would be the inText parameter.

Midway through the ride, we see a <a href="http://www.skipthis.com" target="new">Skip this too</a>. But sometimes we go here [insert normal www dot link dot com]. If you'd like to join us contact Bill Smith at Tester@gmail.com. Thanks!

sorry stack overflow won't allow multiple hyperlinks to be added.

===== End Sample Text =============


First, check out this link.

Then check out the HTML Agility Pack. You will save yourself years of headaches by not parsing HTML with regular expressions.

0

精彩评论

暂无评论...
验证码 换一张
取 消