开发者

php remove/identify this symbol �

开发者 https://www.devze.com 2022-12-21 03:43 出处:网络
EDIT: Ok I have some data (A ton of data) being pulled from a MySQL DB Table, nothing special about how the data is entered. When parsing the data and re-displaying it to Firefox this symbol � shows

EDIT:

Ok I have some data (A ton of data) being pulled from a MySQL DB Table, nothing special about how the data is entered. When parsing the data and re-displaying it to Firefox this symbol � shows up. When I compare it to the DB entry it looks like a space (Nothing special). I'm using all the default PHP/MySQL开发者_如何转开发 settings.

Doing a var_dump or print_r is no help either.

Any thoughts?

The Symbol: �

UPDATE:

Ok I did find the character that is causing the problem

Not to be confused with

-

(The Hyphen).


The character is the REPLACEMENT CHARACTER (U+FFFD). It is used when there was an error within an UTF code:

FFFD � REPLACEMENT CHARACTER

  • used to replace an incoming character whose value is unknown or unrepresentable in Unicode

In most cases it means that some data is interpreted with an UTF encoding while the data is not encoded with that encoding but a different one.


It means a character that isn't available in the character set of the current font. You'll need to encode it with an HTML entity, once you've found where it's coming from.


That character means there is a codepoint that your browser does not know how to display. Somewhere you're setting a character value to something outside the normal printable character range, and your browser is telling you by displaying the standard 'unknown' character.

The only way to tackle the problem is to find the bug that put the invalid character into your string in the first place.


This is a common problem when pasting text from microsoft office products to html, or into a database. The largest offenders seem to be the emdash(as you found) and smart quotes. One solution I have found when users insist upon using a text editor that is non-compliant like that is to have them paste it into something like notepad first, to strip the proprietary symbols.

Obviously the best solution is to simply not use word for textual data that is intended for web display.

Added just to provide some info to future readers.

Regards, Jc


You can look into iconv() and mb_* functions if you're just trying to sanitize the data.

The most likely cause as observed elsewhere is that you've got a problem with character encodings. PHP is not very good at dealing with character encodings until version 6 (dealing with byte arrays and leaving encoding issues more or less up to the developer to deal with).

Make sure you're displaying the page in the same character encoding as your database, and make sure that you convert all user input into that same character encoding (iconv() and mb_detect_encoding() will help) before sticking it in the database.


What are you talking about? Where have you seen this? If its on the rendered page on browser, then you might have saved the file with an improper encoding. Use UTF or unicode encoding while saving the page/source file.


A really vague question. Somehow, check your website's encoding, your database's data encoding and so.

EDIT: It IS an answer because the flaw is a mismatch between the DB data encoding (probably on utf-8) and the webapp encoding (probably on iso-8859-1). So, the solution goes by either:

1.) backup and Wipe out the DB AND THEN load it with the proper encoding 2.) change the webapp's encoding, so the chars are properly displayed.

Regards,


Why not try a regex in javascript against what Gumbo identified as "... character � ... the REPLACEMENT CHARACTER (U+FFFD)" after rendering the webpage - this way you will not have to mess with the DB (which you seem very reluctant to do) and whatever minor performance penalty is offloaded to the client side.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号