开发者

Regexp even number of backslashes (PHP)

开发者 https://www.devze.com 2023-01-29 14:27 出处:网络
I have rather hard time getting my head around regular expression, especially more complex formulas. Currently I am writing my own markup language and am stumped by escaping. I want each special chara

I have rather hard time getting my head around regular expression, especially more complex formulas.

Currently I am writing my own markup language and am stumped by escaping. I want each special character to be "escapable", that is if *bold* would give me <b>bold</b>, then \*bold\* should leave it as-is, so I can do the stripping of backslashes later, but I can't think of a regular expression to convey this idea.

How can I select three groups:

  1. Left asterisk if the number or BSes preceding it is even;
  2. Content between asterisks;
  3. Right asterisk if the number of BSes preceding it is even;

with one regular expression? I need it to be compliant with PHP's preg_replace.

This \\*(\*)\S(.)+?\S\\*(\*) would select both aster开发者_StackOverflow社区isks and content as three groups, but that doesn't check for 'evenity' and stuff.

UPDATE:

The second paragraph has been changed to better illustrate what I meant (please don't modify it anymore because the change that was made completely missed the point).

Plus, if that makes things easier, I can first parse any double backslash into some other character, so there is only need to check for ONE backslash before asterisk.


How about:

$rx = '/
([^\\]*|^)     # no backslash or beginning of line
\\             # one backslash
\*             # an asterisk

([^*\\]+)      # one or more characters not being asterisks or BSs

\\             # one backslash
\*             # one asterisk
               # "mx" = multiline,extended regex
/mx';            

preg_replace($rx, '\1\2', $content)


Well, I guess I found answer to my own question.

First I will have to replace each \\, and then use expression like this:

(?<!\\)      #There is no backslash before...
\*           #...Asterisk

(            #Non-whitespace after first and before second asterisk
  \S .*? \S  
  |
  \S
)

(?<!\\)      #There is no backslash before...
\*           #...Asterisk

And from on here I can tweak it however I wish. Thanks for any input to anyone anyway :).

0

精彩评论

暂无评论...
验证码 换一张
取 消