XML Mapping - XSLT or Code?_问答_开发者_运维开发者技术经验分享

I have had a number of discussions recently over whether to use XSLT or code to write mapping functionality from one XML format to another or even when converting to something other than XML. Now I am of the mindset that XSLT's purpose is exactly for this type of thing and would be the most suitable option.

However, other people are suggesting it wouldn't be appropriate when you need something a little more complex, such as when you need to start looking up data from external repositories. They were also suggesting that XSLT can be as complex as writing the code, so that negates that argument. And testing would be easier with a code solution by utilizing TDD and CI practices.

The basis for this discussion is the design of a common transformation service that should be utilized by WCF services when any mapping is required. For example, when converting an incoming message to a canonical form. I thought it would be best to write this service to perform some matching of the XML message against an XS开发者_如何学JAVALT map. You can then easily drop in/out these maps without code recompilation and it is far easier to get at these maps and understand what is going on outside of the code.

I was wondering what you guys thought and whether anyone had any experience writing something similar? I know I could go out and buy a product, but would rather hear about bespoke solutions.

Thanks

First, just to be clear, XSLT is code. ;) It's a Turing complete, functional programming language.

When input and output is XML I generally prefer XSLT; there is a lot of boiler plate code involved when transforming in a general purpose language. Exceptions are when the input needs to be processed sequentially due to its size (XSLT requires a full in-memory tree structure for both input and output). Also, emitting e.g. plain text via XSLT is a pain (mainly due to white space/line feed issues).

However, a valid concern is whether the skills to read and maintain the XSLT program is generally available on the team. If people are not used to the functional paradigm, learning XSLT can be challenging (i.e. won't happen on Project/Company Time), and often the solution has to be immediately readable to the other developers/maintainers on the project. In that case, I'd (grudgingly ;) go for the general purpose language solution.

XSLT does not prevent unit testing either; transformations can be tested by asserting on the output of transformation test cases.

We've written an event-driven database integration system that uses XSLT extensively to transform XML messages. There are frameworks for performing unit tests of XSLT transformations, and we also find it helpful to be able to transform messages directly in TextMate using the TeXSLMate palette without having to recompile anything.

I think the advantage of using XSLT is that it's a general purpose technology that can be used in lots of contexts - for example we also use it as a templating system for web applications - and for which good tools exist.

I agree with you - XSLT is designed to map from one XML format to another and if you need to map from one format to several others, that's the way to go.

Things are a little less defined when you have to go the other way - several different formats that need to be transformed to a cannonical one... I have seen attempts that tried to do it all in one set of XSL files, and it was horrible to read or understand. I would have gone with a single XSLT per transformation.

One issue with XSLT vs Code is that with XSLT you need the whole XML and XSL files in memory for the transformation, so if you have large files (hunderds of megs and above) this may not be the best way forward.

The other answers cover the reasons for using XSLT pretty well; I want to address your question about database lookups.

Database lookups really fall outside of the scope of transformation. That is, you're no longer transforming one XML format to another; you're including information from an external source that's not part of your input. (I'm assuming that your database is large and/or volatile enough that it can't be represented by an XML document that gets read into the transform via the document function, which is the simplest way to do it.)

In my experience, this is best accomplished by preprocessing the source XML, running it through a process that decorates it with the additional information from the database before passing it on to the transform.

It can also be worth preprocessing source XML to handle situations that XSLT doesn't handle well. For instance, if the XML contains string data that for some reason hasn't been parsed into a nice usable XML form, it can be worth parsing the data in a preprocessor and updating the XML before handing it on to the transform. A good example is when the source format was designed by someone who thought dates should be represented as MM/DD/YYYY or booleans as Y and N. You can work around this in XSLT, certainly, but it often simplifies things if you massage the input to convert values to canonical representations before turning it over to XSLT.