开发者

Is there any downside to save all my source code files in UTF-8?

开发者 https://www.devze.com 2023-02-22 18:25 出处:网络
If that\'s relevant (it very开发者_JS百科 well could be), they are PHP source code files.There are a few pitfalls to take care of:

If that's relevant (it very开发者_JS百科 well could be), they are PHP source code files.


There are a few pitfalls to take care of:

  1. PHP is not aware of the BOM character certain editors or IDEs like to put at the very beginning of UTF-8 files. This character indicates the file is UTF-8, but it is not necessary, and it is invisible. This can cause "headers already sent out" warnings from functions that deal with HTTP headers because PHP will output the BOM to the browser if it sees one, and that will prevent you from sending any header. Make sure your text editor has a UTF-8 (No BOM) encoding; if you're not sure, simply do the test. If <?php header('Content-Type: text/html') ?> at the beginning of an otherwise empty file doesn't trigger a warning, you're fine.
  2. Default string functions are not multibyte encodings-aware. This means that strlen really returns the number of bytes in the string, not the actual number of characters. This isn't too much of a problem until you start splicing strings of non-ASCII characters with functions like substr: when you do, indices you pass to it refer to byte indices rather than character indices, and this can cause your script to break non-ASCII characters in two. For instance, echo substr("é", 0, 1) will return an invalid UTF-8 character because in UTF-8, é actually takes two bytes and substr will return only the first one. (The solution is to use the mb_ string functions, which are aware of multibyte encodings.)
  3. You must ensure that your data sources (like external text files or databases) return UTF-8 strings too, because PHP makes no automagic conversion. To that end, you may use implementation-specific means (for instance, MySQL has a special query that lets you specify in which encoding you expect the result: SET CHARACTER SET UTF8 or something along these lines), or if you couldn't find a better way, mb_convert_encoding or iconv will convert one string into another encoding.


It's actually usually recommended that you keep all sources in UTF8. It won't matter size of regular code with latin characters at all, but will prevent glitches with any special characters.


If you are using any special chars in e.g string values, the size is a little bit bigger, but that shouldn't matter.

Nevertheless my suggestion is, to always leave the default format. I spent so many hours because there was an error with the format saving and all characters changed.

From a technical point of few, there isn't a difference!


Very relevant, the PHP parser may start to output spurious characters, like a funky unside-down questionmark. Just stick to the norm, much preferred.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号