开发者

� appears using character_limiter() with strip_tags() and utf-8 charset

开发者 https://www.devze.com 2023-04-12 19:51 出处:网络
I\'m getting � characters when I combine Codeigniter\'s character_limiter() with PHP\'s native strip_tags(). Here is the code I\'m using:

I'm getting � characters when I combine Codeigniter's character_limiter() with PHP's native strip_tags(). Here is the code I'm using:

<?php echo character_limiter(strip_tags($block->body), 60); ?>

$block->body is an HTML string stored in the database. I do not get this unexpected output if I use only one of the functions. It looks like this:

� appears using character_limiter() with strip_tags() and utf-8 charset

This is what the HTML looks like:

� appears using character_limiter() with strip_tags() and utf-8 charset

I didn't paste the actual HTML because the string would be modified by posting it here, see below

Here is the Codeigniter function character_limiter:

function character_limiter($str, $n = 500, $end_char = '&#8230;')
{
    if (strlen($str) < $n)
    {
        return $str;
    }

    $str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));

    if (strlen($str) <= $n)
    {
        return $str;
    }

    $out = "";
    foreach (explode(' ', trim($str)) as $val)
    {
        $out .= $val.' ';

        if (strlen($out) >= $n)
        {
            $out = trim($out);
            return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
        }
    }
}

I figured out that there was some invisible character or something that may have been causing this, because when I pasted the HTML into a text editor, then back into the "HTML source editor" in the image (which is just TinyMCE), then saved it, the weird characters disappeared.

I am using the utf-8 character set across the board (everywhere possible). The original data did come from a dump of an unknown database, and was imported with an SQL client. However, when I saved the existing string (in the CMS), nothing changed.

I can't connect the dots between these two functions causing this output when used together, and I do not get the � characters normally. I only see this output when I use:

character_limiter(strip_tags($html))

What could be causing this, and how can I prevent it?

Note: I definitely want to use the character_limiter function, or a variation of it. It makes an ellipsis at the end of the string if its length is longer than the second param. Using it alone (without strip_tags) works perfectly fine (no weird characters).

Update: For anyone that can't reproduce this, I put an SQL file online that demos the issue. I am importing this with 开发者_JAVA技巧MySQL Query Browser. I only get this output it seems when the HTML comes from the database. Here is the link (ignore the content, it's the client's fault): http://wesleymurch.com/test/test1.sql


� replacement character used to replace an unknown or unprintable character in php usually we solve this issue using multibyte string functions . use mb_substr with strip tags like :

mb_substr( strip_tags($text) , 0,300 ,'UTF-8' );//or what ever your charset 

or you maybe modify the codeigniter function and use Multibyte String Functions .

UPDATE

function character_limiter($str, $n = 500, $end_char = '&#8230;')
{
    if (mb_strlen($str) < $n)
    {
        return $str;
    }

    $str = mb_ereg_replace("\s+", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));

    if (mb_strlen($str) <= $n)
    {
        return $str;
    }

    $out = "";
    foreach (explode(' ', trim($str)) as $val)
    {
        $out .= $val.' ';

        if (mb_strlen($out) >= $n)
        {
            $out = trim($out);
            return (mb_strlen($out) == mb_strlen($str)) ? $out : $out.$end_char;
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消