I have like this :
$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);
开发者_运维知识库
The output in this case will be : that&#
instead of that's
What i want is to count html entities as 1 character then substr, because i always end up with breaked html or some obscure characters at the end of text.
Please don't suggest me to html decode it then substr then encode it, i want a clean method :)
Thanks
There are two ways of doing this:
You can decode the HTML entities,
substr()
and then encode; orYou can use a regular expression.
(1) uses html_entity_decode()
and htmlentities()
:
$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);
(2) might be something like:
if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
echo $match[0];
}
What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:
any character that isn't an ampersand; or
an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).
This isn't perfect so I would favour (1).
function encoded_substr($string, $param, $param2){
$s = html_entity_decode($string);
$sub = substr($s, $param, $param2);
return htmlentities($sub);
}
There, I copypasted cletus' code into a function for you. Now you can call a very straightforward 3 line function with 1 line of code. If this isn't "clean" then I'm confused what "clean" means.
Be advised that some characters break the proposed decoding + encoding, if you use substr()
.
Example
$string=html_entity_decode("Workin’ on my Fitness…In the Backyard.");
echo $string;
echo substr($string,0,25);
echo htmlentities(substr($string,0,25));
Will output:
- Workin’ on my Fitness…In the Backyard.
- Workin’ on my Fitness�
The solution
Use mb_substr()
.
echo mb_substr($string,0,25);
echo htmlentities(mb_substr($string,0,25));
Will output:
- Workin’ on my Fitness…In
- Workin
’
on my Fitness…
In
Please try with following coding Functions.
<?php
$mytext="that's really "confusing" and <absolutly> silly";
echo limit_text($tamil_var,6);
function limit_text($text,$limit){
preg_match_all("/&(.*)\;/U", $text, $pat_array);
$additional=0;
foreach ($pat_array[0] as $key => $value) {
if($key <$limit){$additional += (strlen($value)-1);}
}
$limit+=$additional;
if(strlen($text)>$limit){
$text = substr( $text,0,$limit );
$text = substr( $text,0,-(strlen(strrchr($text,' '))) );
}
return $text;
}
?>
Well, clean method is only one:
Not to use entities at all.
There are not a single reason to substr entitied string. It can be used to output only.
So, first substr, then encode.
Here is a correction for syntax error code, use mb_substr to avoid surprises like html entity having less characters, or character counting not working the way it should, in my case Sábado becoming Sá:
function encoded_substr($string, $param, $param2){
$s = html_entity_decode($string);
$sub = mb_substr($s, $param, $param2);
return htmlentities($sub);
}
精彩评论