开发者

Losing whitespace around escaped symbols in CDATA using Expat XML parser in C++

开发者 https://www.devze.com 2022-12-14 10:36 出处:网络
I\'m using XML to send project information between applications. One of the pieces of information is the project description. So I have:

I'm using XML to send project information between applications. One of the pieces of information is the project description. So I have:

<ProjectDescription>Test &amp; spaces around&amp;some  &amp;  amps!</ProjectDescription>

Or: "Test & spaces around&some & amps!" <-- GOOD!

When I then use Expat to parse it, my data handler gets just parts of the entire string at a time. "Test", then "&", then "spaces around", the next "&", etc, etc. When I then try to reconstruct the original string, all the spacing around the &'s is dropped because the data handler never gets to see them. When I then re-write the XML I get:

<ProjectDescription>Test&amp;spaces around&amp;some&amp;amps!</ProjectDescription>
开发者_如何转开发

Or: "Test&spaces around&some&amps!" <-- BAD!

Is this a known problem with existing workarounds? Is there some setting I can give Expat to control its behavior around escaped symbols?

My attempts at Googling an answer have met with dismal failure.

EDIT: In response to a question in the comments: I have my own handler, which I register with the parser:

parser=XML_ParserCreate(NULL); 
XML_SetUserData(parser,&depth);
XML_SetElementHandler(parser,startElement,endElement); 
XML_SetCharacterDataHandler(parser,dataHandler); 

The handler is declared as follows:

static void dataHandler(void *userData,const XML_Char *s,int l) 

And then "s" contains the data in the element. Without any & stuff, it's the entire string between the open and close tags, in the case of "a string with spaces".


I have just run a test with my own library that uses expat. My handler looks like this, with debug statements to display what is going on:

void CharDataHandler( void * parser, 
                       const XML_Char *s,
                       int len ) {
    std::cerr << "[" << s << "]\n";
    std::cerr << len << "\n";
    // my own processing here - not important 
}

I don't see the behaviour you are talking about. For the input data:

XXX &amp; YYY

I get three events with the char * and length data set as folows:

char * = "XXX &amp; YYY"
length = 4

char * = "&"
length = 1

char * = " YYY"
length = 4

So the spaces are retained. As far as I know I am not using any specal settings. What version & platform of Expat are you using?

0

精彩评论

暂无评论...
验证码 换一张
取 消