This is a related to a previous question I have asked here, see the link below for a brief description as to why I am trying to do this.
Regular expression from font to span (size and colour) and back (VB.NET)
Basically I need a regex replace function (or if this can be done in pure VB then that's fine) to convert all ul tags in a string to textindent tags, with a different attribute value for the first textindent tag.
For example:
<ul>
<li>This is some text</li>
<li>This is some more text</li>
<li>
<ul>
<li>This is some indented text</li>
<li>This is some more text</li>
</ul>
</li>
<li>More text!</li>
<li>
<ul>
<li>This is some indented text</li>
<li>This is some more text</li>
</ul>
</li>
<li>More text!</li>
</ul>
<ul>
<li>Another list item</li>
<li>
<ul>
<li>Another nested list item</li>
</ul>
</li>
</ul>
Will become:
<textformat indent="0">
<li>This is some text</li>
<li>This is some more text</li>
<li>
开发者_StackOverflow中文版 <textformat indent="20">
<li>This is some indented text</li>
<li>This is some more text</li>
</textformat>
</li>
<li>More text!</li>
<li>
<textformat indent="20">
<li>This is some indented text</li>
<li>This is some more text</li>
</textformat>
</li>
<li>More text!</li>
</textformat>
<textformat indent="0">
<li>Another list item</li>
<li>
<textformat indent="20">
<li>Another nested list item</li>
</textformat>
</li>
</textformat>
Basically I want the first ul tag to have no indenting, but all nested ul tags to have an indent of 20.
I appreciate this is a strange request but hopefully that makes sense, please let me know if you have any questions.
Thanks in advance.
It's possible with regex but LINQ to XML is simpler. I've included LINQ to XML and a regex solution, although I would favor the former.
Here's the LINQ to XML approach. Since ul
is the top element its Name
can be changed directly. Descendants
will grab all the nested ul
items. The only caveat with this approach is it only works if the input is well-formed. If it's wrong LINQ to XML will fail to parse it. Also, if it is well-formed and the ul
isn't the top element but is part of a larger HTML block of text then you'll need to loop over Elements("ul")
then do the same thing over each of them.
If the HTML is malformed you may want to look at the HTML Agility Pack.
Dim xml = XElement.Parse(input)
xml.Name = "textformat"
xml.SetAttributeValue("indent", "0")
For Each item In xml.Descendants("ul")
item.Name = "textformat"
item.SetAttributeValue("indent", "20")
Next
And here's the regex approach. It's not easy to detect the first ul
item to distinguish between the two so this approach changes all of them to an indent of 20, then an extra step is taken to find the first textformat
and change its indent to zero.
Dim pattern As String = "<ul>|</ul>"
Dim result As String = Regex.Replace(input, pattern, Function(m) If(m.Value.StartsWith("</"), "</textformat>", "<textformat indent=""20"">"))
Dim firstTextFormatPattern As String = "^(?<Start><textformat\s+indent="")\d+?(?<End>"">)"
result = Regex.Replace(result, firstTextFormatPattern, "${Start}0${End}")
Thanks for your help with this, I have managed to work out a solution myself using your reply.
Basically I am using a counter to keep track of what level of ul tag the regex has found, and then replacing it with the relevant attribute:
Dim ulCounter As Integer = 0
Dim rxUL As New Regex("<ul>|</ul>")
xmlValue = rxUL.Replace(xmlValue, AddressOf Convert_UL)
Protected Function Convert_UL(ByVal m As Match) As String
Dim HTML As String = ""
If m.Value = "</ul>" Then
ulCounter -= 1
HTML = "</textformat>"
Else
ulCounter += 1
If ulCounter > 1 Then
HTML = "<textformat indent=""20"">"
Else
HTML = "<textformat indent=""0"">"
End If
End If
Return HTML
End Function
This was a pretty random request so I'm not sure how much help this would be to anyone else, but just in case that was how I got round it!
精彩评论