开发者

Problem writing UTF-8 encoded file in PHP

开发者 https://www.devze.com 2023-01-12 05:31 出处:网络
I have a large file that contains world countries/regions that I\'m seperating into smaller files based on individual countries/regions. The original file contains entries like:

I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like:

  EE.04 Järvamaa
  EE.05 Jõgevamaa
  EE.07 Läänemaa

However when I extract that and write it to a new file, the text becomes:

  EE.04  Järvamaa
  EE.05  Jõgevamaa
  EE.07  Läänemaa

To save my files I'm using the following code:

mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
$fp = fopen(MY_LOCATION,'wb');
fwrite($fp,$text);
fclose($fp);

I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original enc开发者_Go百科oding (which is UTF8)?

Thank you!


First off, don't depend on mb_detect_encoding. It's not great at figuring out what the encoding is unless there's a bunch of encoding specific entities (meaning entities that are invalid in other encodings).

Try just getting rid of the mb_detect_encoding line all together.

Oh, and utf8_encode turns a Latin-1 string into a UTF-8 string (not from an arbitrary charset to UTF-8, which is what you really want)... You want iconv, but you need to know the source encoding (and since you can't really trust mb_detect_encoding, you'll need to figure it out some other way).

Or you can try using iconv with a empty input encoding $str = iconv('', 'UTF-8', $str); (which may or may not work)...


It doesn't work like that. Even if you utf8_encode($theString) you will not CREATE a UTF8 file.

The correct answer has something to do with the UTF-8 byte-order mark.

This to understand the issue: - http://en.wikipedia.org/wiki/Byte_order_mark
- http://unicode.org/faq/utf_bom.html

The solution is the following: As the UTF-8 byte-order mark is '\xef\xbb\xbf' we should add it to the document's header.

<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$string; // utf8 bom
fputs($f, $string);
fclose($f);
}
?>

The $file could be anything text or xml... The $string is your UTF8 encoded string.

Try it now and it will write a UTF8 encoded file with your UTF8 content (string).

writeStringToFile('test.xml', 'éèàç');


Maybe you want to call htmlentities($text) before writing it into file and html_entity_decode($fetchedData) before output. It'll work with Scandinavian letters.


It appears that your source file is not, in fact, in UTF-8. You might want to try using the same approach you've been using, but with a different encoding, such as UTF-16 perhaps.


You can do it as follows:

<?php
$s = "This is a string éèàç and it is in utf-8";
$f = fopen('myFile',"w");
fwrite($f, utf8_encode($s));
fclose($f);
?> 
0

精彩评论

暂无评论...
验证码 换一张
取 消