开发者

XSLT Performance Considerations

开发者 https://www.devze.com 2023-04-03 17:25 出处:网络
I am working on a project which uses following technologies. Java, XML, XSLs There\'s heavy use of XMLs. Quite often I need to

I am working on a project which uses following technologies. Java, XML, XSLs

There's heavy use of XMLs. Quite often I need to - convert one XML document into another - convert one XML document into another after applying some business logic.

Everything will be built into a EAR and deployed on an application server. As the number of user is huge, I need to take performance into consideration before defining coding standards.

I am not a very big fan of XSLs but I am trying to understand if using XSLs a better option in this scenario or should I stick of Java only. Note that I have requirements to convert XML into XML format only. I don't have requirements to convert XML into some other format like HTML etc.

From performance and manitainability point of view - isnt JAVA a better option than开发者_如何学Python using XLST for XML to XML transformations?


From my previous experience of this kind of application, if you have a performance bottleneck, then it won't be the XSLT processing. (The only exception might be if the processing is very complex and the programmer very inexperienced in XSLT.) There may be performance bottlenecks in XML parsing or serialisation if you are dealing with large documents, but these will apply whatever technology you use for the transformation.

Simple transformations are much simpler to code in XSLT than in Java. Complex transformations are also usually simpler to code in XSLT, unless they make heavy use of functionality available for free in the Java class library (an example might be date parsing). Of course, that's only true for people equally comfortable with coding in both languages.

Of course, it's impossible to give any more than arm-waving advice about performance until you start talking concrete numbers.


I agree with above responses. XSLT is faster and more concise to develop than performing transformations in Java. You can change XSLT without having to recompile the entire application (just re-create EAR and redeploy). Manual transformations should we always faster but the code might be much larger than XSLT due to XPATH and other technologies allowing very condensed and powerful expressions. Try several XSLT engines (java provided, saxon, xalan...) and try to debug and profile the XSLT, using tools like standalone IDE Altova XMLSpy to detect bottleneck. Try to load the XSLT transformation and reuse it when processing several XMLs that require the same transformation. Another option is to compile the XSLT to Java classes, allowing faster parsing (saxon seems to allow it), but changes are not as easy as you need to re-compile XSLT and classes generated.

We use XSLT and XSL-FO to generate invoices for a billing software. We extract the data from database and create an XML file, transform it with XSLT using XSL-FO and process the result XML (FO instructions) to generate a PDF using Apache FOP. When generating invoices of several pages, job is done in less than a second in a multi-user environment and on a user-request basis (online processing). We do also batch processing (billing cycles) and the job is done faster as reusing the XSLT transformation. Only for very-large PDF documents (>100 pages) we have some troubles (minutes) but the most expensive task is always processing XML with FO to PDF, not XML to XML with XSLT.

As always said, if you need more processing power, you can just "add" more processors and do the jobs in parallel easily. I think time saved using XSLT if you have some experience using it can be used to buy more hardware. It's the dichotomy of using powerful development tools to save development time and buy more hardware or do things "manually" in order to get maximum performance.

Integration tools like ESB are heavily based on XSLT transformations to adapt XML data from one system (sender) to another system (receiver) and usually can perform hundreds of "transactions" (data processing and integration) in a second.


If you use a modern XSLT processor, such as Saxon (available in a free version), you will find the performance to be quite good. Also, in the long term XSL transforms will be much more maintainable than hardcoded Java classes.

(I have no connection with the authors of Saxon)


Here is my observation based on empirical data. I use xslt extensively , and in many cases as an alternative for data processors implemented in java. Some of the data processors we compiled are a bit more involved. We primarily use SAXON EE, through the oxygenxml editor. Here is what we have noticed in terms of the performance of the transformation.

For less complex xsl stylesheets, the performance is quite good ( 2s to read a 30MB xml file and generate over 20 html content pages, with a lot of div structures) . and the variance in performance seems about linear or less with respect to change in the size of the file.

However, when the complexity of the xsl stylesheet changes, the performance change can be exponential.( same file , with a function call introduced in template called often,with the function implementing a simple xpath resolution, can change the processing time , for the same file , from 2s to 24s) And it seems introduction of functions and function calls seem to be a major culprit. That said, we have not done a detailed performance review and code optimization. ( still in alpha mode, and the performance is still within our limits - ie batch job ). I must admit that we may have "abused" xsl function, as in a lot of places we used th idea of code abstraction into functions ( in addition to using templates ) . My suspicion is that, due t the nature in which xslt templates are called, there might be a lot of eventual recursion in the implementation procedures ( for the xslt processor), and function calls can become expensive if they are not optimized . We think a change in "strategy" in way we write our xsl scripts, (to be more XSLT/XPATH centric) may help performance of the xlst processor. For instance, use of xsl keys. so yes, we maybe just as guilty as the processor charged :)

One other performance issue, is memory utilization. While RAM is not technically a problem , but a simple processor ramping from 1GB ( !!! ) to 6GB for a single invocation/transformation is not exactly kosher. There maybe scalability and capacity concerns ( depending on application and usage). This may be something less to do with the underlying xlst processor, and more to do with the editor tool.This seems to have a huge impact on debugging the style sheets in real time ( ie stepping through the xslt ) .

Few observations: - commandline or "production" invocation of the processor has better performance - for consecutive runs ( invoking the xslt processor), the first run takes the longest ( say 10s) and consecutive runs take a lot less ( say 4s ) .Again, maybe something to do with the editor environment.

That said, while performance of the processors may be a pain at times , and depending on the application requirements, it is my opinion that if you consider other factors already mentioned here, such as code maintenance, ease of implementation, rapid changes, size of code base, the performance issues may be mitigated, or can be "accepted" ( if the end application can still live with the perfomance numbers ) when comparing implementation using XSLT vs Java ( or other )

...adieu!

0

精彩评论

暂无评论...
验证码 换一张
取 消