开发者

Norwegian characters problem

开发者 https://www.devze.com 2022-12-15 11:27 出处:网络
I create a folder as follows. function create(){ if ($this->input->post(\'name\')){ ... ... $folder = $this->input->post(\'name\');

I create a folder as follows.

function create(){
if ($this->input->post('name')){
    ...
            ...
    $folder = $this->input->post('name');
    $folder = strtolower($folder);
    $forbidden = array(" ", "å", "ø", "æ", "Å", "Ø", "Æ");
    $folder = str_replace($forbidden, "_", $folder);
    $folder = 'images/'.$folder;
    $this->_create_path($folder);
            ...

However i开发者_StackOverflowt does not replace Norwegian character with _ (under bar)

For example, Åtest øre will create a folder called ã…test_ã¸re.

I have

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

in a header.

I am using PHP/codeigniter on XAMPP/Windows Vista.

How can I solve this problem?


You have to remember to save your PHP file in the correct encoding. Try saving it in ISO-8859-1 or UTF8. Also remember to reopen it after saving, so that you'll see if it is saved correctly or if the characters were converted. Your IDE may convert them to bytes (weird characters) without displaying the change in the editor.

  1. When you write out your file, Save As..
  2. filename.php and below it should say Encoding. Here you should choose ISO-8859-1 (or Latin-1) or UTF8. If you use Notepad this won't be an option, you need to get a proper editor.
  3. Apply the same encoding to all other PHP files in that application. I think ISO-8859-1 will do it, but UTF8 is a good default, so choose it if that works for this.


Try explicitly setting the internal encoding used by PHP:

mb_internal_encoding('UTF-8');

Edit: actually, now that I think about it... I'd advise using strtr. It has support for multibyte characters and would be a good deal faster:

$from  = ' åøæÅØÆ';
$to    = '_______';
$fixed = strtr($string, $from, $to);


Most of the normal string functions don't handle Unicode chars well, if at all.

In this situation, you could use a regular expression to work around that.

<?php
$string = 'Åtest øre';
$regexp = '/( |å|ø|æ)/iu';
$replace_char = '_';

echo preg_replace($regexp, $replace_char, $string)
?>

Returns:

_test__re


The interface you get to the Windows filesystem from PHP is the C standard library one. Windows maps its Unicode filesystem naming scheme into bytes for PHP using the system default codepage. Probably your system default codepage is 1252 Western European if you are in Norway, but that's a deployment detail that can change when you move to put it on a live server and it's not something that's easy to fix.

Your page/site encoding is UTF-8. Unfortunately whilst modern Linux servers typically use UTF-8 as their filesystem access encoding, Windows can't because the default code page is never UTF-8. You can convert a UTF-8 string into cp1252 using iconv; naturally all characters that don't fit in this code page will be lost or mangled. The alternative would be to make the whole site use charset=iso-8859-1, which can (for most cases) be stored in cp1252. It's a bit backwards to be using a non-UTF-8 charset though and of course it'll still break if you deploy it to a machine using a different default code page.

For this reason and others, filenames are hard. You should do everything you can to avoid making a filename out of an arbitrary string. There are many more characters you would need to block to make a string fit in a filename on Windows and avoid directory traversal attacks. Much better to store an ID like 123.jpeg on the filesystem, and use scripted-access or URL rewriting if you want to make it appear under a different string name.

If you must make a Windows-friendly filename from an arbitrary string, it would be easiest to do something similar to slug generation: preg_replace away all characters (Unicode or otherwise) that don't fit known-safe ones like `[A-Za-z0-9_-], check the result isn't empty and doesn't match one of the bad filenames (if so, prepend an underscore) and finally add the extension.


Use this.

$string = $this->input->post('name');
$regexp = '/( |å|ø|æ|Å|Ø|Æ|Ã¥|ø|æ|Ã…|Ø|Æ)/iU';
$replace_char = '_';
0

精彩评论

暂无评论...
验证码 换一张
取 消