I have problem need to solved with Regex
If i am using Firefox or IE8 javascript will generate this code which is what I really want.
<div style=开发者_开发知识库"visibility: hidden;" id="wizardId1">1001</div><div style="visibility: hidden;" id="wizardId2">1002</div>
However with IE7 it will generate diferently to be
<DIV id=wizardId1 style="VISIBILITY: hidden;">1001</DIV><DIV id=wizardId2 style="VISIBILITY: hidden;" >1002</DIV>
which is the id for the div is placed before style parameter.
In my java program, i have regex only to support the first one (Firefox & IE8) the regex is
<(?:DIV|div)\s+style=(?:["\'])*(?:[\w\d:; ]+)*(?:["\'])*\s+id=(?:["\'])*([\w\d]+)(?:["\'])*>([\w\d]+)</(?:DIV|div)>
Because the IE7 placed the id before the style, so i could get the result that i want like this
Result should be appeared
Match 1: <div style="visibility: hidden;" id="wizardId1">1001</div>
Subgroups:
1: wizardId1
2: 1001
Match 2: <div style="visibility: hidden;" id="wizardId2">1002</div>
Subgroups:
1: wizardId2
2: 1002
I tried to use this regex (to take out the style from regex) but the result only return the last id.
<(?:DIV|div).*\s+id=(?:["\'])*([\w\d]+)(?:["\'])*>([\w\d]+)</(?:DIV|div)>
result not wanted
Match 1: <div style="visibility: hidden;" id="wizardId1">1001</div><div style="visibility: hidden;" id="wizardId2">1002</div>
Subgroups:
1: wizardId2
2: 1002
Question
How can i produce the same result using regex as the first result by not considering the ( style="visibility: hidden;" ) ? (Not to use .* and not to add extra group)
Thanks for helping me.
Previously, the .* was matching everything from the end of the first <div
through and including the second <div
.
You can try using a minimal match.
So
<(?:DIV|div).*?\s+id=(?:["\'])*([\w\d]+)(?:["\'])*[^>]*>([\w\d]+)</(?:DIV|div)>
instead of
<(?:DIV|div).*\s+id=(?:["\'])*([\w\d]+)(?:["\'])*[^>]*>([\w\d]+)</(?:DIV|div)>
Note the ? after the .* means match as few as possible.
I would recommend against trying to parse HTML with regexs though.
Maybe you could try a SAX style parser like makeSaxParser
in http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js
Does this work for you?
<(?:DIV|div)(?:(?:\s+style=(?:["\'])*(?:[\w\d:; ]+)*(?:["\'])*)|(?:\s+id=(?:["\'])*([\w\d]+)(?:["\'])*))*>([\w\d]+)</(?:DIV|div)>
This works ok and is pretty general (I assumed you don't need check for the style attribute):
<div.+?id="([^"]+).+?>([^<]+)
Don't forget to turn case insesitivity on, in JavaScript it should look like:
/<div.+?id="([^"]+).+?>([^<]+)/i
精彩评论