We have a situation where Excel's XML 2003 format is widely used for file uploads to a third party system. There is a requirement that some information needs to be extracted from these files for other processing. Are there any libraries available that process Excel's 2003 XML format?
Further detail: the format of these XML files have been defined by the third party and we cannot开发者_Go百科 change it. In addition, the data itself is not tabular or in any consistent format.
I have previously used System.Xml.Linq
to extract data (basically a row, column index approach) from very simple versions of these XML spreadsheets. The data required now is from more complex versions which include merged cells, named ranges etc.
This processing is done on a server, thus Interop is not an option based on this infamous MS KB about Excel on a server
Can someone also suggest an approach to processing these files?
The eventual solution required me to create an XSLT to extract the required data from the file and transform to a very simple XML representation of the data. I also created classes that represented the object model for the deserialzation of the generated XML using XmlSerializer.Deserialize()
.
However, for this to work effectively, an update was required to the source Excel XML files to include Named Ranges for the cells that needed to be extracted.Having the Name Ranges allowed for a more simpler XSLT, however the biggest code-smell is the dependency on the existence of the Named Ranges in a file I don't control.
A high level overview of the code
XPathDocument doc = new XPathDocument("path to Excel xml file");
XslCompiledTransform xslt = new XslCompiledTransform();
StringReader sr = new StringReader(Resources.XSLT); // embedded resource, the xslt is read in as a string
XmlTextReader xs = new XmlTextReader(sr);
xslt.Load(xs);
XmlWriterSettings settings = new XmlWriterSettings()
{
Indent = true,
Encoding = Encoding.UTF8,
OmitXmlDeclaration = false
};
MemoryStream memStream = new MemoryStream();
using (XmlWriter writer = XmlWriter.Create(memStream, settings))
{
xslt.Transform(doc, writer); // the simple xml..almost there
}
MyCustomClass curve;
{
XmlSerializer deSerializer = new XmlSerializer(typeof(MyCustomClass));
// reset needed to beginning of mem stream since current position is the last write position
memStream.Position = 0;
curve = (MyCustomClass)deSerializer.Deserialize(memStream);
}
Have you considered using the xsd.exe tool with your studio to generate classes for easily reading the xml files?
It won't include any smart logic for combining fields of course-- but the benefit is that you won't need to build your own reading logic or install interop libraries.
Check if Excel Data Reader on CodePlex does what you require. I'm using it to import simple lists of data from Excel into one of our applications.
精彩评论