开发者

Regex to support not in order text

开发者 https://www.devze.com 2023-01-19 04:35 出处:网络
I have problem need to solved with Regex If i am using Firefox or IE8 javascript willgenerate this code which is what I really want.

I have problem need to solved with Regex

If i am using Firefox or IE8 javascript will generate this code which is what I really want.

<div style=开发者_开发知识库"visibility: hidden;" id="wizardId1">1001</div><div style="visibility: hidden;" id="wizardId2">1002</div>

However with IE7 it will generate diferently to be

<DIV id=wizardId1 style="VISIBILITY: hidden;">1001</DIV><DIV id=wizardId2 style="VISIBILITY: hidden;" >1002</DIV>

which is the id for the div is placed before style parameter.

In my java program, i have regex only to support the first one (Firefox & IE8) the regex is

<(?:DIV|div)\s+style=(?:["\'])*(?:[\w\d:; ]+)*(?:["\'])*\s+id=(?:["\'])*([\w\d]+)(?:["\'])*>([\w\d]+)</(?:DIV|div)>

Because the IE7 placed the id before the style, so i could get the result that i want like this

Result should be appeared

Match 1: <div style="visibility: hidden;" id="wizardId1">1001</div>
    Subgroups:
    1: wizardId1
    2: 1001
Match 2: <div style="visibility: hidden;" id="wizardId2">1002</div>
    Subgroups:
    1: wizardId2
    2: 1002

I tried to use this regex (to take out the style from regex) but the result only return the last id.

<(?:DIV|div).*\s+id=(?:["\'])*([\w\d]+)(?:["\'])*>([\w\d]+)</(?:DIV|div)>

result not wanted

Match 1: <div style="visibility: hidden;" id="wizardId1">1001</div><div style="visibility: hidden;" id="wizardId2">1002</div>
    Subgroups:
    1: wizardId2
    2: 1002

Question

How can i produce the same result using regex as the first result by not considering the ( style="visibility: hidden;" ) ? (Not to use .* and not to add extra group)

Thanks for helping me.


Previously, the .* was matching everything from the end of the first <div through and including the second <div.

You can try using a minimal match.

So

<(?:DIV|div).*?\s+id=(?:["\'])*([\w\d]+)(?:["\'])*[^>]*>([\w\d]+)</(?:DIV|div)>

instead of

<(?:DIV|div).*\s+id=(?:["\'])*([\w\d]+)(?:["\'])*[^>]*>([\w\d]+)</(?:DIV|div)>

Note the ? after the .* means match as few as possible.

I would recommend against trying to parse HTML with regexs though. Maybe you could try a SAX style parser like makeSaxParser in http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js


Does this work for you?

<(?:DIV|div)(?:(?:\s+style=(?:["\'])*(?:[\w\d:; ]+)*(?:["\'])*)|(?:\s+id=(?:["\'])*([\w\d]+)(?:["\'])*))*>([\w\d]+)</(?:DIV|div)>


This works ok and is pretty general (I assumed you don't need check for the style attribute):

<div.+?id="([^"]+).+?>([^<]+)

Don't forget to turn case insesitivity on, in JavaScript it should look like:

/<div.+?id="([^"]+).+?>([^<]+)/i

0

精彩评论

暂无评论...
验证码 换一张
取 消