开发者

Regular expression replacing only if contained withing a regular expression match?

开发者 https://www.devze.com 2022-12-31 18:23 出处:网络
I have the following: [list] [*] test [*] test [*] test [/list] and I would like to create a regular expression that turns that into:

I have the following:

[list]
[*] test
[*] test
[*] test
[/list]

and I would like to create a regular expression that turns that into:

<ul>
<li>test</li>
<li>test</li>
<li>test</li>
</ul>

I know regex enough to replace simple tags, but in this case I need to replace li tags only if they are contained inside ul. Is there a way to check that before replacing?

I开发者_开发技巧 am using JavaScript if that matters.


Given the text:

[*] test1

[list]
[*] test2
[*] test3
[*] test4
[/list]

[*] test5

the regex:

\[\*]\s*([^\r\n]+)(?=((?!\[list])[\s\S])*\[/list])

matches only [*] test2, [*] test3 and [*] test4. But if the [list]'s can be nested, or a more broader set of a BB-like language needs to be parsed, I opt for a proper parser.

To do the replacements, replace the regex I suggested with:

<li>$1</li>

and then replace [list] with <ul> and [/list] with </ul> (assuming [list] and [/list] are only used for lists and are not present in comments or string literals or something).

When running the following snippet:

var text = "[*] test1\n"+
    "\n"+
    "[list]\n"+
    "[*] test2\n"+
    "[*] test3\n"+
    "[*] test4\n"+
    "[/list]\n"+
    "\n"+
    "[*] test5\n"+
    "\n"+
    "[list]\n"+
    "[*] test6\n"+
    "[*] test7\n"+
    "[/list]\n"+
    "\n"+
    "[*] test8";

print(text + "\n============================");
text = text.replace(/\[\*]\s*([^\r\n]+)(?=((?!\[list])[\s\S])*\[\/list])/g, "<li>$1</li>");
text = text.replace(/\[list]/g, "<ul>");
text = text.replace(/\[\/list]/g, "</ul>");
print(text);

the following is printed:

[*] test1

[list]
[*] test2
[*] test3
[*] test4
[/list]

[*] test5

[list]
[*] test6
[*] test7
[/list]

[*] test8
============================
[*] test1

<ul>
<li>test2</li>
<li>test3</li>
<li>test4</li>
</ul>

[*] test5

<ul>
<li>test6</li>
<li>test7</li>
</ul>

[*] test8

A small explanation might be in order:

  • \[\*]\s* matches the sub string [*] followed by zero or more white space characters;
  • ([^\r\n]+) gobbles up the rest of the line and saves it in match group 1;
  • (?=((?!\[list])[\s\S])*\[/list]) ensures that every match group 1 must have a sub string [/list] ahead of without encoutering a [list]

EDIT

Or better yet, do as Gumbo suggest in the comment to this answer: match all [list] ... [/list] and then replace all [*] ... in those.


Here’s a better approach to Bart K.’s suggestion:

  • find all [list] … [/list]
  • for each match, find all [*] in it

This will ensure that only [*] in [list] … [/list] will be replaced.

The code:

str.replace(/\[list]([\s\S]*?)\[\/list]/g, function($0, $1) {
    return "<ul>" + $1.replace(/^ *\[\*] *(.*)/gm, "<li>$1</li>") + "</ul>";
})
0

精彩评论

暂无评论...
验证码 换一张
取 消