开发者

LibXML internal and output encodings

开发者 https://www.devze.com 2023-01-06 06:51 出处:网络
I\'m trying to write XML files with libxml2 in ISO-8859-1. But from the documentation it seems that for each text node that I create I\'ll have to convert to UTF-8 which is libxml\'s internal encoding

I'm trying to write XML files with libxml2 in ISO-8859-1. But from the documentation it seems that for each text node that I create I'll have to convert to UTF-8 which is libxml's internal encoding. Then when calling xmlSaveFormatFileEnc() libxml converts to the target encoding and adds the encoding attribute to the document.

Is this assumption correct? For now my code goes roughly like this:

xmlNode *root_element = NULL, *node4 = NULL;
xmlDoc *doc = NULL;

doc = xmlNewDoc(BAD_CAST XML_开发者_JAVA技巧DEFAULT_VERSION);
root_element = xmlNewDocNode(doc, NULL, BAD_CAST("root"),
                    NULL);
char * input_str = getLatin1Data();
isolat1ToUTF8(utf8_str, &file_size, input_str, &inlen);

node4 = xmlNewCDataBlock(doc, BAD_CAST list_content, xmlStrlen(BAD_CAST utf8_str));

xmlAddChild(root_element, node4);
xmlSaveFormatFileEnc("test_file.xml", doc, "UTF-8", 1);
xmlFreeDoc(doc);


Your assumption is right. When xmlChar is expected, like in xmlNewCDataBlock, xmlNewText, it is always UTF-8:

From include/libxml/xmlstring.h (libxml 2.8.0):

/**
 * xmlChar:
 *
 * This is a basic byte in an UTF-8 encoded string.
 * It's unsigned allowing to pinpoint case where char * are assigned
 * to xmlChar * (possibly making serialization back impossible).
 */
0

精彩评论

暂无评论...
验证码 换一张
取 消