开发者

Regular expression to find parent::

开发者 https://www.devze.com 2023-03-28 17:15 出处:网络
I want to find all occurrences of parent::, the called function and the parameter For example: parent::test( new ReflectionClass($this) );

I want to find all occurrences of parent::, the called function and the parameter

For example:

parent::test( new ReflectionClass($this) );

But the following regular expression doesn't match the outer brackets - only the inner ones:

parent::(.*)\((.*)\);
Array /* output */
(
    [0] => parent::test( new ReflectionClass($this) );
    [1] => test( new ReflectionCla开发者_开发百科ss
    [2] => $this) 
)

How do I have to modify the pattern?

That is for a PHP script, so I can use some other string functions, too.


What you are trying to do is generally not possible with regular expressions. To do what you want, you have to be able to count things, which is something regular expressions can't do.

Making the matching greedy will eventually lead to matching too much, especially when you are supporting multiple line input.

To replace every occurence of parent:: you probably don't have to match the method call exactly, maybe it is enough to match something like this:

parent::(.*);

Then you can replace the parent:: with something else and use the first matching group to put whatever was in the document at this position.


Using regexes to parse code is a REALLY bad idea. Take a look at PHP's Tokenizer, which you can use to parse PHP code into an array of tokens. You can than use that array to find the information you need.

You can also look at PHP-Token-Reflection's source code as an example of how to get meaningful information from those tokens.

Basically, you would need to find T_PARENT occurrences T_STRING occurrences with 'parent' as the string contents, followed by T_DOUBLE_COLON, followed with another T_STRING that contains the method name, than go forward and start counting the depth of the parentheses - whenever you get to an '(', increase the counter by one. Whenever you get to an ')', decrease the counter by one. Keep a record of everything you find in the process until the counter gets back to 0.

Something like that should work (not actually tested):

<?php
$tokens = tokens_get_all(...);
for ($i=0, $size = count($tokens); $i < $size; $i++( {
    if ($tokens[$i][0] === T_STRING && $tokens[$i][1] === 'parent' && $tokens[++$i][0] === T_DOUBLE_COLON && $tokens[++$i][0] === T_STRING) {
        $method = $tokens[$i][1];
        $depth = 0;
        $contents = array();
        do {
            $contents[] = $token = $tokens[++$i];
            if ($token === '(') {
                $depth++;
            } elseif ($token === ')') {
                $depth--;
            }
        } while ($depth > 0);
        echo "Call to $method with contents:\n";
        print_r(array_slice($contents, 1, -1)); // slices off the opening '(' and closing ')'
    }
}


Here is an example which is not really robust, but it would match the case in your question.

(parent::)([^\(]*)\(([^\(]*)\(([^()]*)\)

Here is a live regex test to experiment around: http://rubular.com/r/WwRsRTf7E6 (Note: rubular.com is targeted at ruby, but should be similar enough for php).

The matched elements would be in this case:

parent::
test
new ReflectionClass
$this

If you want something more robust, you might want to look into parsing tools (e.g. write a short grammer, that matches php function definitions) or static code analysis tools, as these often consist of AST generators etc. I have no personal experience with this one, but it sounds quite comprehensive:

  • https://github.com/facebook/pfff

pfff is a set of tools and APIs to perform some static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code. For now the effort is focused on PHP ...


If you are only interested in the function and whatever is inside the round brackets,
and most parent:: calls are in a single line only. This may work for you.

parent::(.*?)\((.*)\);

The first capture should stop after the first encountered ( as this is not greedy.
The second capture will not stop until it captures the last ); on the same line.

Note: Do not use s modifier as this will result in greedy matching up to the last ); in multiple lines of your code.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号