开发者

XML database evaluation

开发者 https://www.devze.com 2023-02-03 10:59 出处:网络
My application consumes XML data from different vendors. All the vendors have different xml formats/schema and custom queries are required to retrieve various data from those XML.

My application consumes XML data from different vendors. All the vendors have different xml formats/schema and custom queries are required to retrieve various data from those XML.

I initially started out with a RDBMS approach, wherein after retrieving a specific XML from the vendor, i would parse/query the XML and write the data in some tables.(using Woodstock StAX parser). However due to the very nature of RDBMS (fixed schema), i am not able to support all the XML formats from different vendors and even if i do, i have to "normalize" the hierarchical xml into a RDBMS fixed schema relational-data.

The xml/data from the vendors are updated frequently everyday and sizes vary between a few kbs up-to 50 MB data files.

I am evaluating various NXD (Native Xml Databases), eXist-db, Sedna, BaseX and MonetDB as a next step to see if this would suite my purposes.

Can some one please provide some practical advice on how to work this out? or has built similar system, which handles a lot of XML data of different formats/schema.

Here are the core XML requirements i am trying to answer:

  • Handles multiple xml data files, from multiple sources. XMLs are different 开发者_运维问答from vendor to vendor.
  • XML updates, of the whole document as well as some fields in the existing XML in the DB.
  • Identify whether its from a particular vendors and fire the queries accordingly.
  • Query these xml using XPath/XQuery to read the data to present it to the users in a common view.

Please advice.

Thanks, Subhro.


I think you're right to go for a native XML database rather than a hybrid database such as SQL server. The hybrid databases, because they do so much, often take longer to provide core XML functionality such as conformant implementations of XQuery and XQuery updates; and the fact that they offer so many different ways of storing data can make it difficult to keep your developers under control.

As to which product, there's a real problem here. Evaluating database products is an expensive process, and doing it thoroughly enough to get a meaningful answer will probably cost you more than making the wrong choice. When a database project fails, it's usually not because the database software was deficient, but because the project lacked the skills and experience to use the chosen database software to its best advantage. So I would say, choose a product based on the skills available to you.


I would use SQL Server as the database, in conjunction with SQL Server Integration Services (SSIS) as the integration tool.

SQL Server supports an XML column type natively. It can require the XML to follow a schema or set of schemas, and can process the XML using XQuery. At the same time, you can, of course, break down the parts of the XML that are the same from vendor to vendor into tables.

The XML can also be indexed by XML indexes, which will make it much faster to query.

SSIS can be used to gather the different XML files, either from different disk drops or via FTP or web services, and can process it into the database, with or without additional preprocessing. For instance, you might find that you can rationalize the different XML formats to a degree, keeping most of the data in normal relational tables, while still storing the remainder (or the entire document) in the same database for subsequent queries.

OBTW, SSIS is free with SQL Server.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号