开发者

Preg match on php display code

开发者 https://www.devze.com 2022-12-09 10:39 出处:网络
Hi all im currently writing a \"display php code\" function (output can be seen at http://www.actwebdesigns.co.uk/web-design-mansfield/php-functions/display-code-function.php)

Hi all im currently writing a "display php code" function (output can be seen at http://www.actwebdesigns.co.uk/web-design-mansfield/php-functions/display-code-function.php)

Im having trouble with the color scheme which is done by regular expression. The 2 in particular are:

strings:

$line = preg_replace("#(\s|\()(\"[^\"]*\")(\,|\))#is", "\\1<span class=\"string\">\\2</span>\\3", $line);

(and trying)

#\"((?!(?:\"\s*;)|(?:\"\s*,)).)*#is

and functions:

$line = preg_replace("#(\s*)(@?|!?[a-z]+(?:[a-z]|[0-9]|_)*)(\s*)\(([^\)]*)\)#is", "\\1<span class=\"fun开发者_如何学Cction\">\\2\\3</span>(\\4)", $line);

(if a function is inside a function it does not change color.


About your string regex: you say it is a string if and only if it is preceded by a white space character or a ( and it is directly followed by a , or ). Needles to say, that is not correct. You'd miss strings like:

$s = "123";     // ends with a ;
$s = "ab\"cd";  // contains an escaped double quote
$t = 'efg' ;    // is surrounded by single quotes

to name just three (there are many more, and what about 'here-docs'?).

To account fix the cases above, try something like this:

$line = 's = "123"; t = "ab\\\\\\"cd"; u = \'efg\' ; v = \'ef\\\'g\' ';
echo $line . "\n";
echo preg_replace('/((["\'])(?:\\\\.|(?:(?!\2).|[^\\\\"\'\r\n]))*\2)/', '<span class="string">$1</span>', $line);
/* output:
s = "123"; t = "ab\\\"cd"; u = 'efg' ; v = 'ef\'g'
s = <span class="string">"123"</span>; t = <span class="string">"ab\\\"cd"</span>; u = <span class="string">'efg'</span> ; v = <span class="string">'ef\'g'</span>
*/

A short explanation:

(                        # start group 1
  (["\'])                #   match a single- or double quote and store it in group 2
  (?:                    #     start non-matching group 1
    \\\\.                #     match a double quote followed by any character (except line breaks)
    |                    #     OR
    (?:                  #     start non-matching group 2
      (?!\2).            #       a character other than what is captured in group 2
      |                  #       OR
      [^\\\\"\'\r\n]     #       any character except a backslash, double quote, single quote or line breaks
    )                    #     end non-matching group 2
  )*                     #   end non-matching group 1 and match it zero or more times
  \2                     #   the quote captured in group 2
)                        # end group 1

Then some comments about your second regex: you first try to match zero or more white space characters. This can safely be omitted because if no white spaces exist you'd still have a match. You could use a \b (word boundary) before matching the function name. Also, (?:[a-z]|[0-9]|_) can be replaced by (?:[a-z0-9_]). And this part of your regex: (@?|!?[a-z]+(?:[a-z]|[0-9]|_)*) which is the same as:

(
  @?
  |
  !?
  [a-z]+
  (?:
    [a-z]
    |
    [0-9]
    |
    _
  )*
)

only better indented to see what it actually does. If you look closely, you will see that it will match just @?, and since the @ is made optional by the ?, that part of your regex will match an empty string as well. No what you'd expected, eh? After that, I must confess I stopped looking at that regex any more, better throw it away.

Try something like this to match function names:

'/\b[a-z_][a-z0-9_]*(?=\s*\()/i'

Which means:

\b           # a word boundary (the space between \w and \W)
[a-z_]       # a letter or an underscore
[a-z0-9_]*   # a letter, digit or an underscore, zero or more times
(?=          # start positive look ahead
  \s*        #   zero ore more white space characters
  \(         #   an opening parenthesis
)            # end positive look ahead

This last one is not tested at all, I leave that for you. Also note that I know very little PHP, so I may be over-simplifying it, in which case it would help if you provide a couple of example code snippets you want to match as functions.

Furthermore a word of caution, parsing code using regex-es can be tricky, but if you're only using it to perform highlighting of small snippets of code, you should be fine. When the source files get larger, you might see a drop in performance and you should make some parts of your regex-es "possessive" which will increase the runtime of your matching considerately (especially on larger source files).

Lastly, you're probably reinventing the wheel. There exist numerous (well tested) code-highlighters you can use. I suspect you already know this, but I thought it would still be worth mentioning.

FYI, I've had good experience with this one: http://shjs.sourceforge.net/doc/documentation.html


why so complicated? Use hightlight_string(). ...and output buffering and ini_set(), if you need to change its output.

0

精彩评论

暂无评论...
验证码 换一张
取 消