开发者

Optimize "tagging" regex

开发者 https://www.devze.com 2023-01-05 21:38 出处:网络
I currently use this piece of code to reduce a given text to a valid \"tagging\" format (only lowercase, a-z and minus allowed) by removing/replacing invalid characters

I currently use this piece of code to reduce a given text to a valid "tagging" format (only lowercase, a-z and minus allowed) by removing/replacing invalid characters

        $zip_filename = strtolower($original);
        $zip_filename = preg_replace("/[^a-zA-Z\-]/g", '-', $zip_filename); //replace invalid chars
        $zip_filename = preg_replace("/-+/g", '-', $zip_filename); // reduce consecutive minus to only one
        $zip_filename = preg_replace("/^-/g", '', $zip_filename); // removing leading minus
        $zip_filename = preg_replace("/-$/g", '', $zip_filename); // remo开发者_如何学JAVAve trailing minus

Any hints on how to put at least the regex into a single one?

Thanks for any advice!


$zip_filename = trim(preg_replace("/[^a-z]+/", '-', $zip_filename),'-');

Explanation:

  1. A-Z is useless since it should be lower case
  2. Adding + after right bracket will replace one or more consecutive invalid chars
  3. Using trim with second parameter - character to trim form beginning and end will speed up the code
  4. Removing \- from preg_replace will also take car of hyphens between invalid chars / multiple consecutive hyphens, replacing them to single one.


You'll only make the code more difficult to understand/maintain by combining these four operations into one.

You also don't need the complexity and performance hit of regular expression-based operations to achieve what you need.

The reduction of double to single minus signs may be more readily achieved with a looped call to str_replace:

while (substr_count($zip_filename, '--')) {
    $zip_filename = str_replace('--', '-', $zip_filename);
}

Wrapping this up in a well-named class method will abstract away any apparent complexity and aid the readability of the code.

The last two operations can be handled by the trim() function:

$zip_filename = trim($zip_filename, '-');

You can then replace your regular expression-based operations with something less cpu aggressive and arguably easier for others to understand:

//replace invalid chars
$zip_filename = preg_replace("/[^a-zA-Z\-]/g", '-', strtolower($original)); 

// reduce consecutive minus to only one
while (substr_count($zip_filename, '--')) {
    $zip_filename = str_replace('--', '-', $zip_filename);
}

// remove leading and trailing minus
$zip_filename = trim($zip_filename, '-'); 


This should simplify it considerably...

$zip_filename = trim(strtolower($original));
$zip_filename = preg_replace("/\s\s+|--+|[^a-zA-Z-]/g", '-', $zip_filename);

The trim will take care of the spaces before and after the string. Also note the \s\s+ and --+. These are more efficient at finding duplicates. They'll only match those characters if there are 2 or more in succession, therefore avoiding unnecessary replacement operations.

But technically it'd still be possible to have leading or trailing dashes. And for that you'd still need this...

$zip_filename = preg_replace("/^-|-$/g", '', $zip_filename);

(This last operation couldn't really be combined with the other since you're using a different replacement string.)

0

精彩评论

暂无评论...
验证码 换一张
取 消