How would I write a RegEx to:
Find a match where the f开发者_StackOverflow社区irst instance of a >
character is before the first instance of a <
character.
(I am looking for bad HTML where the closing >
initially in a line has no opening <
.)
It's a pretty bad idea to try to parse html with regex, or even try to detect broken html with a regex.
What happens when there is a linebreak so that the > character is the first character on the line for example (valid html).
You might get some mileage from reading the answers to this question also: RegEx match open tags except XHTML self-contained tags
Would this work?
string =~ /^[^<]*>/
This should start at the beginning of the line, look for all characters that aren't an open '<' and then match if it finds a close '>' tag.
^[^<>]*>
if you need the corresponding <
as well,
^[^<>]*>[^<]*<
If there is a possibility of tags before the first >
,
^[^<>]*(?:<[^<>]+>[^<>]*)*>
Note that it can give false positives, e.g.
<!-- > -->
is a valid HTML, but the RegEx will complain.
精彩评论