PHP code explanation question._问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-23 08:52 出处：网络

I don\'t know if this id the place to ask this question so be kind if I am 开发者_如何学Cwrong.

相关专题：php

I don't know if this id the place to ask this question so be kind if I am 开发者_如何学Cwrong.

I was wondering if someone can explain to me in detail what the following 3 code snippets below do.

Snippet 1

if($str !== mb_convert_encoding(mb_convert_encoding($str, 'UTF-32', 'UTF-8'), 'UTF-8', 'UTF-32')){
    $str = mb_convert_encoding($str, 'UTF-8');
}

Snippet 2

$str = preg_replace('`&([a-z]{1,2})(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig);`i', '\\1', $str);

Snippet 3

$str = preg_replace(array('`[^a-z0-9]`i','`[-]+`'), '-', $str);

Here is the full code below for reference.

function to_permalink($str){
    if($str !== mb_convert_encoding(mb_convert_encoding($str, 'UTF-32', 'UTF-8'), 'UTF-8', 'UTF-32')){
        $str = mb_convert_encoding($str, 'UTF-8');
    }
        $str = htmlentities($str, ENT_NOQUOTES, 'UTF-8');
        $str = preg_replace('`&([a-z]{1,2})(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig);`i', '\\1', $str);
        $str = html_entity_decode($str, ENT_NOQUOTES, 'UTF-8');
        $str = preg_replace(array('`[^a-z0-9]`i','`[-]+`'), '-', $str);
        $str = strtolower(trim($str, '-'));
        return $str;
}

Snippet 1 makes sure the string is in UTF-8 encoding.

Snippet 2 converts all special characters to their base form (ie, 'é' -> 'e').

Snippet 3 will convert spaces to hyphens (-).

All in all, taking into account the function's name and content, I'd say it is used to make URL friendly links, for example, convert

I discovered a new french word: église

i-discovered-a-new-french-word-eglise

Usually used for SEO.

Many of your questions can be answered by looking up what the functions do in your code.

Go here to get started: http://php.net/docs.php

Snippet #1: Checking if the string is valid UTF-8 data by round-trip converting it from source-> UTF-32 -> UTF-8. If the result is NOT the same as the input, then try to let the MB library determine the input encoding and output as UTF-8 regardless. Seems to be rather much work for little gain.

Snippet #2: Looks for a series of potential character entities (accented characters, in this case), and strips off the leading & and trailing ; if it matches and adds a backslash. So Æ becomes \AElig.

Snippet #3: Converts any character which is NOT a-z or 0-9 or a sequence of 1 or more - into a single -.