开发者

Regex: absolute url to relative url (C#)

开发者 https://www.devze.com 2022-12-24 11:36 出处:网络
I need a regex to run against strings like the one below that will convert absolute paths to relative paths under certain conditions.

I need a regex to run against strings like the one below that will convert absolute paths to relative paths under certain conditions.

<p>This website is <strong>really great</strong> and people love it <img alt="" src="http://localhost:1379/Content/js/fckeditor/editor/images/smiley/msn/teeth_smile.gif" /></p>

Rules:

  • If the url contains "/Content/" I would like to get the relative path

  • If the url does not contain "/Content/", it is an external file, and the absolute path should remain

Regex unfortunatley is not my forte, and this is too advanced for me at this point. If anyone can offer some tips I'd appreciate it.

Thanks in advance.

UPDATE: To answer questions in the comments:

  • At the time the Regex is applied, All urls will begin with "http://"
  • This should be applied to the src attribute of both im开发者_开发百科g and a tags, not to text outside of tags.


You should consider using the Uri.MakeRelativeUri method - your current algorithm depends on external files never containing "/Content/" in their path, which seems risky to me. MakeRelativeUri will determine whether a relative path can be made from the current Uri to the src or href regardless of changes you or the external file store make down the road.


Unless I'm missing the point here, if you replace

^(.*)([C|c]ontent.*)

With

/$2

You will end up with

/Content/js/fckeditor/editor/images/smiley/msn/teeth_smile.gif

This will only happen id "content" is found, so in cae you have a URL such as:

http://localhost:1379/js/fckeditor/editor/images/smiley/msn/teeth_smile.gif

Nothing will be replaced

Hope it helps, and that i didn't miss anything.

UPDATE

Obviously considering you are using an HTML parser to find the URL inside the a href (which you should in case you're not :-))

Cheers


That is for perl, I do not know c#:

s@(<(img|a)\s[^>]*?\s(src|href)=)(["'])http://[^'"]*?(/Content/[^'"]*?)\4@$1$4$5@g

If c# has perl-like regex it will be easy to port.


This function can convert all the hyperlinks and image sources inside your HTML to absolute URLs and for sure you can modify it also for CSS files and Javascript files easily:

Private Function ConvertALLrelativeLinksToAbsoluteUri(ByVal html As String, ByVal PageURL As String)
    Dim result As String = Nothing
    ' Getting all Href
    Dim opt As New RegexOptions
    Dim XpHref As New Regex("(href="".*?"")", RegexOptions.IgnoreCase)
    Dim i As Integer
    Dim NewSTR As String = html
    For i = 0 To XpHref.Matches(html).Count - 1
        Application.DoEvents()
        Dim Oldurl As String = Nothing
        Dim OldHREF As String = Nothing
        Dim MainURL As New Uri(PageURL)
        OldHREF = XpHref.Matches(html).Item(i).Value
        Oldurl = OldHREF.Replace("href=", "").Replace("HREF=", "").Replace("""", "")
        Dim NEWURL As New Uri(MainURL, Oldurl)
        Dim NewHREF As String = "href=""" & NEWURL.AbsoluteUri & """"
        NewSTR = NewSTR.Replace(OldHREF, NewHREF)
    Next
    html = NewSTR
    Dim XpSRC As New Regex("(src="".*?"")", RegexOptions.IgnoreCase)
    For i = 0 To XpSRC.Matches(html).Count - 1
        Application.DoEvents()
        Dim Oldurl As String = Nothing
        Dim OldHREF As String = Nothing
        Dim MainURL As New Uri(PageURL)
        OldHREF = XpSRC.Matches(html).Item(i).Value
        Oldurl = OldHREF.Replace("src=", "").Replace("src=", "").Replace("""", "")
        Dim NEWURL As New Uri(MainURL, Oldurl)
        Dim NewHREF As String = "src=""" & NEWURL.AbsoluteUri & """"
        NewSTR = NewSTR.Replace(OldHREF, NewHREF)
    Next
    Return NewSTR
End Function
0

精彩评论

暂无评论...
验证码 换一张
取 消