开发者

Wildcard replace in PHP

开发者 https://www.devze.com 2022-12-12 15:53 出处:网络
I have no experience using regular expressions in PHP, so I usually write some convoluted function using a series of str_replace(), substr(), strpos(), strstr() etc (you get the idea).

I have no experience using regular expressions in PHP, so I usually write some convoluted function using a series of str_replace(), substr(), strpos(), strstr() etc (you get the idea).

This time I want to do this correctly, I know I need to use a regex for this, but am confused as to what to use (ereg or preg), and how exactly the syntax should be.

NOTE: I am NOT parsing HTML, or XML, and sometimes I will be using delimiters other than (for example, | or ~ or [tag] or ::). I am looking for a generic way to do a wildcard replace in between two known delimiters using regex, I am not building an HTML or XML parser.

What I need is a regex that replaces this:

<sometag>everything in here</sometag>

with this:

<sometag>new contents</sometag>

I have read the documentation online for a bit, but I am confused, and am hoping one of you regex experts can pop in a simple solution. I suspect I will pass the values to a function, something like this:

$new_text = swapText ( "<sometag>开发者_如何学Go", $the_new_text_to_go_into_the_dag );

function swapText ( $in_tag_with_brackets_to_update, $in_new_text ) {
 // define tags
 $starting_tag  = $in_tag_with_brackets_to_update;
 $ending_tag    = str_replace( "<", "</", $in_tag_with_brackets_to_update) );

 // not sure if this is the proper regex match string or not
 // and/or if any escaping needs to be done on the tags
 $find_string         = "{$starting_tag}.*{$ending_tag}";
 $replace_with_string = "{$starting_tag}{$in_new_text}{$ending_tag}";

 // after some regex, this function should return new version of <tag>data</tag>
}

Thanks.


You say that you are not going to parse xml and then goes on to show an xml example. That's a bit confusing.

Now, the reason why you can't use regular expressions to parse xml, is that they aren't contextual. Therefore there are a whole class of problems that regular expressions can't be used for. This includes nested tags (Whether they are xml or not), so keep that in mind.

That out of the way, you should be using preg - not ereg. ereg is a lesser used, slower and now deprecated type of regular expressions. Just forget about it.

In pcre (Perl Compatible Regular Expressions), which is the language that preg uses, a . (dot) is a wildcard, that matches any single character (Except newline). You can put a quantifier after a match. A quantifier can be an explicit range of numbers, such as {1,3} (meaning at least one, but up to 3) or you can use one of the short hand symbols, such as + (Short for {1,}, meaning at least one) or * (Meaning any number, including zero). With this knowledge, you can match anything with .*.

By default, expressions will match the largest possible pattern (Known as being greedy). You can change this with the ? modifier. Thus .*? will match anything, but take the shortest possible pattern. This can then be used to match any delimited value like follows:

~<foo>.*?</foo>~

Note that I'm using ~ as the delimiter here to avoid having to escape / in the expression. The standard is to use / as delimiter, in which case the expression would have looked like this:

/<foo>.*?<\/foo>/

In general, the above is bad practise, since it's much better to match a negated character class than a dot, but to keep things simple for you, just ignore this until you get the basics under your skin. It'll work in most cases. In particular, since the . doesn't match newlines, this won't work if the content contains a newline character. If you need this you can do one of two things: Either you add a modifier to the expression, or, you replace the . with a character class, that includes newlines. For example [\s\S] (Meaning a whitespace character or a non-whitespace character, which is the same as anything). This is how the expression would look then:

~<foo>.*?</foo>~s

Or:

~<foo>[\s\S]*?</foo>~

To put all this to work, let's pass it to the preg_replace function:

echo preg_replace('~<foo>.*?</foo>~s', '<foo>Lorem Ipsum</foo>', $input);

If your tag-names are variable, you can build the expression up like you would with an SQL query. Just like SQL, you need to escape certain characters. Use preg_quote for that:

function swapText($tagname, $replacement_text, $input) {
  $tagname_escaped = preg_quote($tagname, '~');
  return preg_replace(
    '~<' . $tagname_escaped . '>.*?</' . $tagname_escaped . '>~s',
    '<' . $tagname . '>' . $replacement_text . '</' . $tagname . '>',
    $input);
}


@OP, there's no need to use complicated regex or a parser if your task is very simple. an example just using your normal substrings....

$mystr='<sometag>everything in here</sometag>';
$start=strpos($mystr,"<sometag>");
$end=strpos($mystr,"</sometag>");
print substr($mystr,0,$start+strlen("<sometag>") ) . "new value" . substr($mystr,$end);


First, if it is html you are replacing, use something like simple html dom. If the format is exactly what you say (as in, <sometag> can't be <sometag >), then regex may be ok to use.

Don't use ereg based functions, as they are deprecated, use the preg functions.

preg_replace('%(<sometag>)[^<]*(</sometag>)%i', '$1something else$2', $str);

EDIT
A slightly better version of the above, now supports having a < in the text

preg_replace('%(<sometag>).*?(</sometag>)%i', '$1something else$2', $str);

The $1 and $2 are the matched text between the brackets. As these are constant, they could be replaced with the constant

preg_replace('%<sometag>.*?</sometag>%i', '<sometag>something else</sometag>', $str);


I've written the following function to replace parts of a string by wildcard:

function wildcardReplace($String,$Search,$Filler,$Wildcard = '???'){

        list($startStr,$endStr) = explode($Wildcard,$Search);

        $start = strpos($String,$startStr);

        // Make sure the end point is the first closest match after the start string.   

        $endofstarter = strpos($String,$startStr) + strlen($startStr);

        $startofender = strpos(
                    substr($String,$endofstarter),
                    $endStr
                ) + $endofstarter;


        $Result = substr($String,0,$start+strlen($startStr) ) . $Filler. substr($String,$startofender);

        // Replace any remaining stuff

        $RemainingString = substr($String,$startofender);

        // If theres any matches left, replace them

        if(strpos($RemainingString,$startStr)>-1) $Result = str_replace($RemainingString,wildcardReplace($RemainingString,$Search,$Filler),$Result);

        return $Result;
}

Example use: $Output = wildcardReplace('<a href="http://www.youtube.com/watch?v=dQw4w9WgXcQ"><img src="rickroll.png" width="500"></a>','width="???"',350,'???')

0

精彩评论

暂无评论...
验证码 换一张
取 消