On OS-X (PHP5.2.11) I have a file: siësta.doc (and thousand other with Unicode filenames) and I want to convert the file names to a web-consumable format (a-zA-Z0-9.). If I hardcode the file name above I can do the right conversion:
<?php
$file = 'siësta.doc';
echo preg_replace("/[^a-zA-Z0-9.]/u", '_', $file);
// Output: si_sta.doc
?>
But if I read the file names with scandir, I've got strange conversions:
<?php
$files = scandir(DIRNAME);
foreach ($files as $file) {
echo preg_replace("/[^开发者_运维百科a-zA-Z0-9.]/u", '_', $file);
// Output for the file above: sie_sta.doc
}
?>
I tried to detect the encoding, set the encoding, convert it with iconv functions. I tried the mb_ functions also. But it was just worse. What did I do wrong?
Thanks in advance
Interesting. After a bit recherché i've found that OSX stores filenames as "decomposed unicode" (see http://developer.apple.com/mac/library/qa/qa2001/qa1173.html). That is, "ë" is represented as "e" + diaresis symbol (0xcc88).
You did try utf8_encode? (Works on Windows at least)
<?php
$files = scandir(DIRNAME);
foreach ($files as $file) {
echo preg_replace("/[^a-zA-Z0-9.]/u", '_', utf8_encode($file));
// Output for the file above: sie_sta.doc
}
?>
The problem is the communication between windows and php. It´s no possible to get unicode filenames, because they depend on non-unicode windows application language.
The best solution is to execute a dir command and get info to be processed, but you must do it so through a cmd, and getting the windows short names:
chcp 65001
dir /x c:\test\ > myinfo.txt
It returns:
El volumen de la unidad C es Windows8_OS
El número de serie del volumen es: 14A3-025F
Directorio de C:\test
22/12/2015 22:11 <DIR> .
22/12/2015 22:11 <DIR> ..
22/12/2015 22:12 0 a.txt
22/12/2015 22:10 <DIR> English
22/12/2015 22:10 <DIR> ESPAOL~1 Español
22/12/2015 22:11 <DIR> 8311~1 ру́сский язы́к
22/12/2015 22:10 <DIR> _0B41~1 عربي ,عربى
22/12/2015 22:10 <DIR> 8F4C~1 北方話
1 archivos 0 bytes
7 dirs 839.672.786.944 bytes libres
Then, you can read myinfo.txt to get the connection between original name and windows short name.
Some PHP functions works fine with short names and you can build and array as if you need display it:
$array['short_name']= $original_name;
For example: is_dir, is_file
works fine. However, scandir
or is_readable
fails with short names too. The solution to use these functions is to rerun a dir command recursively.
To get info from txt file, you can use a regular expression or substr, discarding the first five lines and the last two. For example:
for($k=6;$k<(count($array)-2);$k++) ...
精彩评论