开发者

from relational tables to nested XML output file

开发者 https://www.devze.com 2022-12-20 19:51 出处:网络
Using IBM DB2 database, I have a three relational tables: Project: id, title, description Topic: projectId, value
Using IBM DB2 database, I have a three relational tables:

    Project: id, title, description
    Topic: projectId, value
    Tag: projectId, value

I need to produce the following XML file fro开发者_如何学编程m the previous table: 

    <projects>
      <project id="project1">
        <title>title1</title>
        <description>desc1</description>
        <topic>topic1</topic>
        <topic>topic2</topic>
        <tag>tag1</tag>
        <tag>tag2</tag>
        <tag>tag3</tag>  
      </project>
      ...
    </projects>


I've tried the following query, and it works:

    XQUERY 
    let $projects := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "project", XMLATTRIBUTES(id, title, description)) AS project FROM mydb.Project')  
    let $TopicSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Topic')
    let $TagSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Tag')

    for $project in $projects return
    <project>
    {$project/@ID}
    <title>{$project/fn:string(@TITLE)}</title>
    <description>{$project/fn:string(@DESCRIPTION)}</description>
    {for $row in $TopicSet[@PROJECTID=$project/@ID] return <Topic>{$row/fn:string(@VALUE)}</Topic>}
    {for $row in $TagSet[@PROJECTID=$project/@ID] return <Tag>{$row/fn:string(@VALUE)}</Tag>}
    </project>
    ;

However, it took 9 hours to complete (there 200k projects in the table)

How can I improve that?
Do I really need to create the three intermediate db2-fn:sqlquery to achieve this? is there another way?
Would it be faster if I create these 3 three intermediate db2-fn:sqlquery and put them in a table (with only one row and one attribute), and then index this before querying the "for $project in $projects return" part?


Or, how would you proceed to achieve my goal?


Best regards,
David

---
As proposed by Peter Schuetze, I tried the XMLAGG as follows:
SELECT 
XMLSERIALIZE(
  XMLDOCUMENT(
    XMLELEMENT(
      NAME "Project",
      XMLATTRIBUTES(P.project), 
      XMLAGG(XMLELEMENT(NAME "Topic", Topic.value)),
      XMLAGG(XMLELEMENT(NAME "Tag", Tag.value)),
    ) 
  ) AS CLOB(1M)
) 
FROM mydb.project P 
LEFT JOIN mydb.Topic Topic ON (P.project = Topic.project) 
LEFT JOIN mydb.Tag Tag ON (P.project = Tag.project) 
GROUP BY P.project;

This works indeed much much faster!
However, if a project has not any topic, it will still display topic element, with a blank text, such as:
    <projects>
      <project id="project1">
        <title>title1</title>
        <description>desc1</description>
        <topic></topic>
        <tag>tag1</tag>
        <tag>tag2</tag>
        <tag>tag3</tag>  
      </project>
      ...
    </projects>
How to remove this "<topic></topic>"?


Use XMLFOREST instead of XMLELEMENT if there is the possibility that the column could be NULL and you don't want an empty element tag when that happens. So, for topics, you'd replace its XMLELEMENT function with

XMLFOREST( Topic.value AS "topic" )

There is a problem with the way you've included two XMLAGG functions in the same SELECT statement. If you have just one XMLAGG in your statement, there's no problem, since the GROUP BY on the parent key will neatly collapse the child entries that are specified inside XMLAGG. However, when you specify more than one XMLAGG function inside the same SELECT, the query produces a Cartesian product internally, so in this case, you'll see repeating items inside each group returned by XMLAGG. The example you gave with there being only zero or one topics for a project does not demonstrate this problem, but if a project had two topics and three tags, you'd see each topic repeated three times, and each tag repeated twice. To prevent this, you'll need to relocate each XMLAGG to a subquery or a common table expression that produces a single XML fragment so you can safely reference it from the main query.

Below is an example that pushes the XMLAGG down into common table expressions. It also gets rid of the need for XMLFOREST, since XMLAGG won't produce any results for an empty input set.

WITH 
topicxml( projectid, xmlfragment ) AS (
SELECT topic.projectid, 
XMLAGG( XMLELEMENT( NAME "topic", topic.value ) ORDER BY topic.value)
FROM mydb.topic topic 
GROUP BY topic.projectid
),
tagxml ( projectid, xmlfragment ) AS (
SELECT projectid, 
XMLAGG( XMLELEMENT( NAME "tag", tag.value ) ORDER BY tag.value)
FROM mydb.tag tag
GROUP BY tag.projectid
)
SELECT XMLSERIALIZE ( CONTENT XMLELEMENT( NAME "project",
XMLATTRIBUTES( p.id AS "id" ),
XMLELEMENT( NAME "title", p.title ),
XMLELEMENT( NAME "description", p.description ),
XMLCONCAT( topicxml.xmlfragment, tagxml.xmlfragment )
) AS VARCHAR(2000) )
FROM mydb.project p
LEFT OUTER JOIN topicxml ON topicxml.projectid = p.id
LEFT OUTER JOIN tagxml ON tagxml.projectid = p.id
;


Look at the XMLAGG function. This should be perfect for your need. I haven't tried it yet but the example on the linked page is almost exactly what you want to do.

0

精彩评论

暂无评论...
验证码 换一张
取 消