\w depends of the encoding?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-22 06:26 出处：网络

I\'ve been searching the word boundaries from encodings with: <?php header(\'Content-Type: text/plain; charset=\"ISO-8859-7\"\');//Changing the charset attribute

I've been searching the word boundaries from encodings with:

<?php
header('Content-Type: text/plain; charset="ISO-8859-7"');//Changing the charset attribute
$i=0;
for($i=0;$i<=255;$i++){
    $char=chr($i);  
    if(preg_match('/^\w$/',$char,$m)){
    echo "[".ord($m[0])."]";}
    }
?>

I dont know if its wrong. But always is giving me certain positions, no matter what charset is speciefied. It seems that always, no matter what encoding, the '\w' match the bytes that开发者_JS百科 are words from the ISO-8859-1.

Yes! \w, \b is affected by character set! In my code, I use:

setlocale(LC_CTYPE, "cs_CZ");

to handle it. This affects the behaviour of \w, \b in regexps but also strtoupper(). If you also need sorting and comparing of strings to work well, you would use (depending on your country/locale) something like:

setlocale(LC_COLLATE, "cs_CZ");

I also found this hard way - that it didn't work... :)

So, to answer your original question - you cannot affect this with header() function, because this just tells the encoding to the browser. What you need is to change the behaviour of PHP at the server, which is accomplished by the above commands.