开发者

rapidxml parse error with url attribute

开发者 https://www.devze.com 2023-02-26 17:52 出处:网络
I\'m getting a strange error with rapidxml when parsing a xml file like <?xml version=\"1.0\" encoding=\"UTF-8\"?>

I'm getting a strange error with rapidxml when parsing a xml file like

<?xml version="1.0" encoding="UTF-8"?>
<IMG align="left"
 src="http://www.w3.org/Icons/WWW/w3c_home" />

It throws "expected >". Im using a code like the following to parse the data

std::fstream file("./test.xml");
std::istream_iterator<char> eos;
std:开发者_运维百科:istream_iterator<char> iit (file);

std::vector<char> xml(iit, eos);
xml.push_back('\0');

xml_document<> doc;
doc.parse<0>(&xml[0]);

the "/" symbol in the IMG rag seems t be the problem. Is this a rapidxml bug or am I doing something wrong?


The way you load the XML data into vector is wrong. In C++ text mode streams have "skipws" flag set by default, which causes them to skip all whitespace in the input. You can verify this by examining the contents of your vector - it will have all spaces/endlines missing. This obviously causes the parser to complain.

Unset skipws flag on the stream to get the correct behaviour:

file.unsetf(ios::skipws);

Alternatively, you can use file class from rapidxml_utils.hpp to load the file:

using namespace rapidxml;
file<> file("test.xml");
xml_document<> doc;
doc.parse<0>(file.data());

Sadly, loading text files with C++ streams is very tricky and full of traps.

As for sehe tests above, the "incorrectly accepted" cases are by design (I don't have enough reputation to add comments to his answer). You need to use "parse_validate_closing_tags" parse flag to make the parser check whether end tag name matches starting tag name:

doc.parse<parse_validate_closing_tags>(...);

See parse_validate_closing_tags in rapidxml manual. The rationale for this behaviour is performance - verifying end tags is time consuming and in most cases not needed.


I just tried it out of curiosity. RapidXml might be fast, but it sure isn't very good

#include "rapidxml.hpp"

int main(int argc, char* args[])
{
        using namespace rapidxml;
        xml_document<> doc;    // character type defaults to char
        doc.parse<0>(args[1]);    // 0 means default parse flags

}

Invoking it results in all kinds of funny business:

Correctly accepted:

$ ./test.exe "<hello>world</hello>"

$ ./test.exe '<?xml version="1.0" encoding="UTF-8"?> <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" />'

Correctly rejected

$ ./test.exe '<hello we="" / >'
terminate called after throwing an instance of 'rapidxml::parse_error'
  what():  expected >
Aborted (core dumped)

Incorrectly accepted:

$ ./test.exe '<hello we="close">world</die><zellq></die>'

$ ./test.exe '<hello we="close/">world</die><we horrible=""></don'\''t>'

YMMV


Your XML is valid. If the code and the XML are exactly as you posted, it must be a rapidxml bug. I guess it either doesn't support breaking attribute list among multiple lines, or less likely, doesn't support /> for end of tag.

0

精彩评论

暂无评论...
验证码 换一张
取 消