开发者

Import huge XML data (> 1Gb) into SQL Server 2008 daily

开发者 https://www.devze.com 2022-12-12 06:05 出处:网络
I have encountered a problem that I need to import a huge XML (> 1Gb) into SQL Server 2008 daily. What I have now is a sample XML file a开发者_高级运维nd the XML schema of it. The XML schema is pretty

I have encountered a problem that I need to import a huge XML (> 1Gb) into SQL Server 2008 daily. What I have now is a sample XML file a开发者_高级运维nd the XML schema of it. The XML schema is pretty complex which contains many custom defined simple type, and element with complex type such as:

<xs:element name="xxxx_url">
        <xs:complexType>
            <xs:simpleContent>
                <xs:extension base="xs:anyURI">
                    <xs:attribute ref="target" use="optional"/>
                    <xs:attribute ref="abc" use="optional"/>
                </xs:extension>
            </xs:simpleContent>
        </xs:complexType>
</xs:element>

After import, a WCF service will be implemented to retrieve the data stored in SQL Sever, something like search, retrieve etc (read-only operations).

The implementation steps I can think of are like:

  1. Define an object model according to the provided XSD (manually), the object model will be used for WCF service to return values.
  2. Define a database schema from the provided XSD (manually), the schema is estimated to have about 20 - 30 tables.
  3. Create a SSIS package to load XML into database daily.
  4. Create a WCF service which reads from database, populates data into the object model defined in step 1 and returns the object to service client.

The problem is that these steps involve lots of manual work. I have to research the XSD row by row, and transform it to object model and database schema mannualy.

I did some research that there're some automation tools to transform XSD into classes, and also transform XSD into database schema. But the classes transformed from XSD with the tool are pretty messed, and the transformation to schema is failed because it does not conform to MS dataset format.

I am wondering is there any good solution to this problem, to save a lot of manual work?

Any suggestion is appreciated !


At some point you have to do the transformation. Whether you do so reading the XML into objects or into data in tables. The work should be done once and then you just have to run the resulting process. The issues I see are:

  • The XML is very large.

  • You do not yet have a mapping of the XSD to your desired schema.

The mapping is work that you are going to have to do. I would think it would perform best if you can import the XSD into table and then import from those temporary table to the schema you wish to use. Working with the XML file will give you issues due to its size.

So my suggestion is to force/fudge the import of the XML into what ever table structure will work. Then write a stored procedure to "import" the data from those tables into your "real" schema.

Pat O


try spliting the xml it more then one file because future problems where stuff like ýÿơƝƈï might appear in the database due to upload errors


In short, our solution will require some work - there's no quick fix.

For scalability, I'd recommend a technology that allows you to stream through the XML (a-la SAX) as opposed to attempting to load and transofrm it all in RAM. For SSIS purposes, there isn't a whole lot of value in converting the XML into an object-graph, so consider any of the following opportunities:

  1. Stream and shread the XML document using a custom script with multiple outputs, then use other SSIS components to transform the resulting data.
  2. Bulk load (stream) the XML into a staging instance of SQL server, then query the XML from there (not a great solution, but easier than 1.), or
  3. Semi-shread the document into smaller chunks, bulk load those chunks into a staging area, then work on them seperately using XSD transforms, etc. This opens the doors to better parallelism.


Do you have example data you can post with at least one complete record worth of data?

Also, do you have access to the source database used to create this XML data? XML isn't really designed for this size of data transfer - your task would be much easier with the data in flat file formats for each table.


SQL Server has built in XML types - it can create tables from your schema.

would they help you here?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号