I am writing code that pulls data from database tables and writes them to an XML file (to be picked up by another process). There are bout 60,000 records, and each record can have multiple entites, i.e. a member can have several contacts, each contact can have several phone numbers, etc. The entire process takes several hours, and I've narrowed down the slowness to the actual pull of the data (not writing the XML, or any data processing after it's pulled, etc.). I've tried the code two different ways, with similar results. First I leveraged LINQ queries and created all the elements in one statement:
Dim output =
From m In dc.members
Select New XElement("member", _
New XElement("id", m.member_id), _
New XElement("address", m.Address), _
New XElement("city", m.City), _
New XElement("state", m.State), _
New XElement("contacts", _
From c in m.contacts
Select New XElement("contact", _
New XElement("contact_name", c.name), _
New XElemdnt("contact_address", c.address), _
...
I thought it might be the creation of all the XElements that was slowing it down, so I tried writing the elements di开发者_如何学运维rectly to the XML file using For loops:
Dim output As New Xml.XmlTextWriter("my.xml", Nothing)
For Each m in dc.members
output.WriteStartElement("member")
output.WriteElementString("id", m.member_id)
output.WriteElementString("address", m.Address)
output.WriteElementString("city", m.City)
output.WriteElementString("state", m.State)
output.WriteStartElement("contacts")
For Each c in m.contacts
output.WriteStartElement("contact")
output.WriteElementString("contract_name", m.name)
output.WriteElementString("contract_address", m.address)
....
That produced almost no change in amount of time the process took. I then tried stripping out all the elements and reduced the code down to just the database pulls, and it was similarly slow.
Is there a faster/better way to pull all this normalized data from the database so I can get it to the XML file as quickly as possible?
Each pass of the inner loop is hitting the database. Use a LINQ expression to fetch just the data you want in a single hit. In C# (sorry, I don't know VB.Net) it would look something like:
var members = from m in dc.members
select new {
m.member_id,
m.Address,
...
contacts = from c in m.contacts
select new {
c.name,
c.address
}
};
var output = from m in members
select new XElement...
As a sidenote, you can use XML syntax in VB.Net...
Dim output = _
From m In members _
Select <member>
<id><%= m.member_id %></id>
<address><%= m.Address %></address>
...
<contacts>
<%= From c in m.contacts _
Select <contact>
<contact_name><%= c.name %></contact_name>
<contact_address><%= c.address %></contact_address>
...
</contact>
%>
</contacts>
</member>
I think your problem is lazy loading of the Linq.
First thing I will do is to run the SQL Profiler and find out if that is the case and thousands of connections are opened and closed accessing each level of data.
If that is the case, I will scrap the Linq and use SqlDataReader
which will probably take seconds to do this job.
If it comes a one big query in profiler and SQL side is slow, I will look at the indexes and optimise teh database indexes.
This should be significantly faster for the above mentioned reasons
Dim output =
From m In dc.members.**Include("contacts")**
Select New XElement("member", _
New XElement("id", m.member_id), _
New XElement("address", m.Address), _
New XElement("city", m.City), _
New XElement("state", m.State), _
New XElement("contacts", _
From c in m.contacts
Select New XElement("contact", _
New XElement("contact_name", c.name), _
New XElemdnt("contact_address", c.address)
.....)**.tolist()**
It will select everything in one db access call instead of N+1
also you could try parallel linq .. something like this ... I don't have an editor available.
dim output = from m in dc.members.asparallel().select(function (m) new XElement("member", New XElement("id", m.member_id), etc..)
精彩评论