A legacy system that I am working开发者_StackOverflow中文版 on has a piece of xml which has data embedded in one of the nodes which uses its own format - for some or other reason. I need to extract the information and then re-format the dates.
This is the piece of xml:
<Information>
[OB]LGW|Sun 23, May 2010|11:15|MCO|Sun 23, May 2010|15:25[/OB]
</Information>
I need it transformed to look like this:
<Flight
ArrivalDateTime="2010-05-16T15:35:00"
DepartureDateTime="2010-05-16T11:30:00"
DirectionInd="Outbound"
RPH="1"
TravelCode="24"
Type="Charter"
>
Whenever I see an XML document like this, my first impulse is to make the person who created it do his job. Seriously, it's 2010. It is long past time to expect that if you're required to produce XML, that means that you produce usable XML, not data in your seekrit private format that's had tags wrapped around it. Emitting stuff like that is lazy and contemptuous.
Of course, it's not always possible to get people to behave like responsible professionals. My second approach is to preprocess the XML and repair it before it gets to XSLT, or to any other code that's expecting rationally-constructed XML. This saves me from having to figure out how to make XSLT do things that it was never designed to do. It also means that any non-XSLT code that processes this data downstream can be simpler.
Since the dates are different, I assume the example raw & cooked formats are not actually the 'same' data. Also, your question relates only to dates: are you expecting to parse the origin & destination airport codes as well?
Either way, since the data is in a non-XML format, you aren't going to get an XML parser to parse it. That is, the XML parser will recognise the data as a Text node child of the Information Element node, but no XML tool can know how to pick the text apart. You will need to write your own parser for this purpose.
XPath function tokenize() might be of some use to you, as well as substring-after and substring-before.
精彩评论