开发者

Setting encoding in XML files

开发者 https://www.devze.com 2023-01-20 22:32 出处:网络
Which are the valid xml encoding strings? For instance, what is the way of specifying UTF-8: encoding=\"utf8\"

Which are the valid xml encoding strings? For instance, what is the way of specifying UTF-8:

  • encoding="utf8"
  • encoding="utf8"
  • etc

Or Windows 1251:

  • encoding="windows-1251"
  • encoding="windows1251"
  • encoding="cp-1251"
  • etc.

I am making a character decoder as well as a xml parser. Thus, I need to be able to set the encod开发者_JAVA技巧ing of my StreamReader based on the value from the encoding attribute.

Any ideas where I could find a list of the official encoding string?

The best I could find is this, but it seems to be IE specific.

Thanks!


If all fails, read the spec :-).

4.3.3 Character Encoding in Entities

Each external parsed entity in an XML document may use a different encoding for its characters.

[...]

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values " ISO-8859-1 ", " ISO-8859-2 ", ... " ISO-8859- n " (where n is the part number) SHOULD be used for the parts of ISO 8859, and the values " ISO-2022-JP ", " Shift_JIS ", and " EUC-JP " SHOULD be used for the various encoded forms of JIS X-0208-1997.

It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority IANA-CHARSETS, other than those just listed, be referred to using their registered names; other encodings SHOULD use names starting with an "x-" prefix.

Source: http://www.w3.org/TR/REC-xml/

So UTF-8 is written as encoding="UTF-8".

For other character sets not listed above, use the names given in the IANA character set list.

Case of the letters in the character set name is not significant: "However, no distinction is made between use of upper and lower case letters." (IANA character set list). So you could also write encoding="uTf-8" if you feel like it ;-).

BTW: Are you really, really certain you want to write your own XML parser? This sounds suspiciously like reinventing the wheel.


<?xml version="1.0" encoding="utf-8"?>

should be fine for utf-8.


Use command locale -A to see all the encodings: http://dwbitechguru.blogspot.ca/2014/07/check-foreign-characters-support-on.html

Option A: To add encoding using the below tags:

You can edit the encoding attribute in the the dtd using XML spy.

Related links: http://dwbitechguru.blogspot.ca/2014/07/issue-xml-reader-error.html

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号