开发者

receiving everyday XML files - 12 types need to do search on these everyday

开发者 https://www.devze.com 2023-02-19 07:33 出处:网络
Asp.NET - C#.NET I need a advice regarding a design problem below: I\'ll receive everyday XML files. It changes the quantity e.g. yesterday 10 XML files received, today XML 56 files received and may

Asp.NET - C#.NET

I need a advice regarding a design problem below:

I'll receive everyday XML files. It changes the quantity e.g. yesterday 10 XML files received, today XML 56 files received and maybe tomorrow 161 XML files etc.

There are 12 types (12 XSD)... and in the top there is a attribute called FormType e.g. FormType="1", FormType="2" , FormType="12" etc. up to 12 formtypes.

All of them have common fields like Name, adres, Phone. But e.g. FormType=1 is for Construction, FormType=2 is for IT, FormType 3=Hospital, Formtype=4 is for Advertisement etc. etc.

As I said all of them have common attributes.

Requirements: Need a search screen so the user can do search on these XML contents. But I don't have any clue how to approach this. e.g. Search the text in some attributes for the xml's received from Date_From and Date_To.

Problem: I've heard about putting the XML's in a Binary field and do XPATH query or whatever but don't know the word's to search on google.

I was thinking to create a big database.table 开发者_StackOverflow中文版and read all XML's and put in the Database Table. But the issue is some xml attributes are very huge like 2-3 pages. and the same attributes in other XML file are empty.. So creating NVARCHAR(MAX) for every XML attribute and putting them in table.field.... After some period my DATABASE will be a big big monster...

Can someone advice what is the best approach to handle this issue?


I'm not 100% sure I understand your problem. I'm guessing that the query's supposed to return individual XML documents that meet some kind of user-specified criteria.

In that event, my starting point would probably be to implement a method for querying a single XML document, i.e. one that returns true if the document's a hit and false otherwise. In all likelihood, I'd make the query parameter an XPath query, but who knows? Here's a simple example:

public bool TestXml(XDocument d, string query)
{
   return d.XPathSelectElements(query).Any();
}

Next, I need a store of XML documents to query. Where does that store live, and what form does it take? At a certain level, those are implementation details that my application doesn't care about. They could live in a database, or the file system. They could be cached in memory. I'd start by keeping it simple, something like:

public IEnumerable<XDocument> XmlDocuments()
{
   DirectoryInfo di = new DirectoryInfo(XmlDirectoryPath);
   foreach (FileInfo fi in di.GetFiles())
   {
      yield return XDocument.Load(fi.Filename);
   }
}

Now I can get all of the documents that fulfill a request like this:

public IEnumerable<XDocument> GetDocuments(query)
{
   return XmlDocuments.Where(x => TextXml(x, query));
}

The thing that jumps out at me when I look at this problem: I have to parse my documents into XDocument objects to query them. That's going to happen whether they live in a database or the file system. (If I stick them in a database and write a stored procedure that does XPath queries, as someone suggested, I'm still parsing all of the XML every time I execute a query; I've just moved all that work to the database server.)

That's a lot of I/O and CPU time that gets spent doing the exact same thing over and over again. If the volume of queries is anything other than tiny, I'd consider building a List<XDocument> the first time GetDocuments() is called and come up with a scheme of keeping that list in memory until new XML documents are received (or possibly updating it when new XML documents are received).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号