开发者

RegEx - HTML between two values

开发者 https://www.devze.com 2023-04-07 09:56 出处:网络
I am looking to get the html that is included between the following text: <ul type=\"square\"> </ul>

I am looking to get the html that is included between the following text:

<ul type="square">  
</ul>

What's the most effi开发者_JAVA百科cient way?


I always use XPath to do things like that.
Use an XPath that will extract the node and then you can fetch the InnerHTML from that node. Very clean, and the right tool for the job.

Additional details: The HAP Explorer is a nice tool for getting the XPath you need. Copy/paste the HTML into HAP Explorer, navigate to the node of interest, copy/paste the XPath for that node. Put that XPath string in a string resource, fetch it at runtime, apply it to the HTML document to extract the node, fetch the desired information from the node.


If you really want one:
@<ul type="square">(.*?)</ul>@im


I agree that an HTML parser is the correct way to solve this problem. But, to humor you and answer your original question purely for academic interest, I propose this:

/<[Uu][Ll] +type=("square"|square) *>((.*?(<ul[^>]*>.*</ul>)?)*)<\/[Uu][Ll]>/s

I'm sure there are cases where this will fail, but I can't think of any so please suggest /* them */ more.

Let me restate that I don't recommend you use this in your project. I am merely doing this out of academic interest, and as a demonstration of WHY a regex that parses html is bad and complicated.


Regular expressions should not be used to parse HTML!

This will definitely not work:

<ul type="square">(.*)</ul>
0

精彩评论

暂无评论...
验证码 换一张
取 消