I'm trying to make a PHP regex to extract functions from php source code. Until now i used a recursive regex to extract everything between {} but then it also matches stuff like if statements. When i use something like:
preg_match_all("/(function .*\(.*\))({([^{}]+|(?R))*})/", $data, $matches);
It doesn't work when there is more than 1 function in the file (probably because it uses the 'function' part in the recu开发者_StackOverflow社区rsiveness too).
Is there any way to do this?
Example file:
<?php
if($useless)
{
echo "i don't want this";
}
function bla($wut)
{
echo "i do want this";
}
?>
Thanks
regexps is the wrong way to do it. Consider tokenizer or reflection
Moved here from duplicate question: PHP, Regex and new lines
Regex solution:
$regex = '~
function #function keyword
\s+ #any number of whitespaces
(?P<function_name>.*?) #function name itself
\s* #optional white spaces
(?P<parameters>\(.*?\)) #function parameters
\s* #optional white spaces
(?P<body>\{.*?\}) #body of a function
~six';
if (preg_match_all($regex, $input, $matches)) {
print_r($matches);
}
P.S. As was suggested above tokenizer is preferable way to go.
Regex accepting recursive curly brackets in body
I know there is a selected answer, but in case tokenizer can not be used this is a simple regex to extract function (name, param and body) from php code.
Main difference with Ioseb answer above is that this regex accepts cases with recursive curly brackets in the body, means that it won't stop after the first curly brackets closing.
/function\s+(?<name>\w+)\s*\((?<param>[^\)]*)\)\s*(?<body>\{(?:[^{}]+|(?&body))*\})/
Explanation
/ # delimiter
function # function keyword
\s+ # at least one whitespace
(?<name>\w+) # function name (a word) => group "name"
\s* # optional whitespace(s)
\((?<param>[^\)]*)\) # function parameters => group "param"
\s* # optional whitespace(s)
(?<body>\{(?:[^{}]+|(?&body))*\}) # body function (recursive curly brackets allowed) => group "body"
/ # delimiter
Example
$data = '
<?php
function my_function($param){
if($param === true){
// This is true
}else if($param === false){
// This is false
}else{
// This is not
}
}
?>
';
preg_match_all("/function\s+(?<name>\w+)\s*\((?<param>[^\)]*)\)\s*(?<body>\{(?:[^{}]+|(?&body))*\})/", $data, $matches);
print_r($matches['body']);
/*
Array
(
[0] => {
if($param === true){
// This is true
}else if($param === false){
// This is false
}else{
// This is not
}
}
)
*/
Limitation
Curly brackets have to be balanced. ie, this body will be partially extracted :
function my_function(){
echo "A curly bracket : }";
echo "Another curly bracket : {";
}
/*
Array
(
[0] => {
echo "A curly bracket : }
)
*/
精彩评论