开发者

XQuery performance - is unordered the answer?

开发者 https://www.devze.com 2023-04-05 17:45 出处:网络
I have an XQuery performance question I hope someone can assist with. The code below is working fine but I would like to improve performance if possible.

I have an XQuery performance question I hope someone can assist with.

The code below is working fine but I would like to improve performance if possible. What it is doing is ... - getting all the distinct values of the prodname attribute found in the hits then - works out how many times each distinct value occurs in the hits - returns those distinct values in order along with the total for each

I sometimes have up to 12000 items in $hits so the whole process can take a while (longer than I would like it too anyway).

I have read that using unordered expressions/functions can have a significant improvement on performance. So, my question is, is there a way of improving the performance of the code below - using unordered or any other way - and what coding changes would need to be made? I would still need to keep the "order by $d" line so as to keep the distinct values in alpha order for the return

let $tempResult := 
for $d in distinct-values($hits/ancestor-or-self::DOCUMENT/@prodname)
    let $q := $hits/ancestor-or-self::DOCUMENT[@prodname = $d]      (: all the hits where prodname attribute has value of $d :)
        order by $d
        return <item zprodname="{$d}" zprodnamenum="{coun开发者_StackOverflow中文版t($q)}"/>


XQuery optimizers vary enormously from one product to another, and techniques to improve performance on one product can be quite different from those on another. So one can't answer this question without (a) knowing what product you are using, and (b) having fairly detailed knowledge of that product's optimiser.

I see no particular reason why "unordered" should help the performance of this query, but if you want to find out, try it and see.

The first thing I would do to try and improve this query would be to put the value of $hits/ancestor-or-self::DOCUMENT (or perhaps $hits/ancestor-or-self::DOCUMENT/@prodname) into a variable. It might make a difference on some products, or it might not.

Unfortunately XQuery 1.0 gives you no other way to write grouping queries than this "nested loop" style. If you can't get it to perform, consider using an XSLT 2.0 xsl:for-each-group instruction, which is far more likely to be efficient because you are saying exactly what you want and only asking for one pass over the data.


To Michael's point, in MarkLogic the approach is to resolve this out of indexes because you might be getting counts of millions of items, and the cardinality could be very low. Here's what it looks like with MarkLogic extensions:

for $d in cts:element-attribute-values(xs:QName("your-element"),xs:QName("prodname"),(),"frequency-order")
return <item zprodname="{$d}" zprodnamenum="{cts:frequency($d)}"/>

Where "frequency-order" returns the items in the order of their frequency, but you could omit that argument and get them back in scalar order.

This is a common coding pattern for search applications where there's a desire for faceted navigation (see www.markmail.org for an XQuery-based example, where date histogram and facets all use this approach). We have packaged up a number of coding best practices in a SearchAPI that ships with MarkLogic to make building this sort of interface declarative - you simply specify arguments with an XML document, and XQuery writes up the appropriate code (similar to the example above) and you get back and XML payload.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号