I'm trying to read data from the .xlsx files using SharpZipLib to unpack it (in memory) and reading the inner xml files. Everything is fine but recognizing the dates - they're stored in julean format and I need to somehow recognize if a number is a date or only a number. In another topic (unfortunately it died and I need quick answer) I got to know some things from Mark Baker, but it's still not enough...
"Excel stores dates as a float value... the integer part being the number of days since 1/1/1900 (or 1/1/1904 depending on which calendar is being used), the fractional part being the proportion of a day (ie the time part)... made slightly more awkward by the fact that 1900 is considered a leap year.
The only thing that differentiates a data from a number is the number format mask. If you can read the format mask, you can use that to identify the value as a date rather than a number... then calculate the date value/formatting from the base date."
"But doesn't the attribute "s" for dates has always the value of "1"? I know it defines style, but maybe? ;)"
The s attribute references a style xf entry in styles.xml, and it won't always be entry 1 for dates... it all depends how many different styles are being used in the workbook. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask)
"One more question - I'm now looking at the style.xml's content and in the section I see elements like: "<xf numFmtId="14" ... applyNumberFormat="1" />", "<xf numFmtId="1" ... applyNumberFormat="1" />", etc. but there is no <numFmts> section... Are there any "开发者_JAVA技巧;standard" formats? Or am I just missing something?"
Can someone please help me out? Thanks in advance.
You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<numFmts count="3">
<numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" />
<numFmt numFmtId="165" formatCode="0.000" />
<numFmt numFmtId="166" formatCode="#,##0.000" />
</numFmts>
EDIT
I've been double-checking my xlsx reader code (it's been a long while since I delved into that part of the library); and there are built-in formats. Number format codes (numFmtId) less than 164 are "built-in".
The list that I have is incomplete:
0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';
5 = '$#,##0;\-$#,##0';
6 = '$#,##0;[Red]\-$#,##0';
7 = '$#,##0.00;\-$#,##0.00';
8 = '$#,##0.00;[Red]\-$#,##0.00';
9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';
37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';
44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '@';
27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';
59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';
Cells may have styles. These are uints that index cellXfs in the styleSheet. Each cellXfs item contains a set of attributes. The most important is NumberFormatID. If its value falls in the range 14-22 it is a "standard" date. If it falls in the range 165 - 180, it is a "formatted" date and will have a corresponding NumberingFormat attribute.
Standard Date
[x:c r="A2" s="2"][x:v]38046[/x:v][/x:c]
[x:xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /] (ordinal position 2)
Formatted Date
[x:c r="A4" s="4"][x:v]38048[/x:v][/x:c]
[x:xf numFmtId="166" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /](ordinal position 4)
[x:numFmt numFmtId="166" formatCode="m/d;@" /]
This code extracts a list of style IDs that correspond to these date formats.
private void GetDateStyles()
{
//
// The only way to tell dates from numbers is by looking at the style index.
// This indexes cellXfs, which contains NumberFormatIds, which index NumberingFormats.
// This method creates a list of the style indexes that pertain to dates.
WorkbookStylesPart workbookStylesPart = (WorkbookStylesPart) UriPartDictionary["/xl/styles.xml"];
Stylesheet styleSheet = workbookStylesPart.Stylesheet;
CellFormats cellFormats = styleSheet.CellFormats;
int i = 0;
foreach (CellFormat cellFormat in cellFormats)
{
uint numberFormatId = cellFormat.NumberFormatId;
if ((numberFormatId >= 14 && numberFormatId <= 22)
|| (numberFormatId >= 165u && numberFormatId <= 180u))
{
_DateStyles.Add(i.ToString());
}
i++;
}
I would suggest that numFmtId="14" should be considered to be "Windows Short Date format" as in Australia this format will display a date as, "dd/mm/yy", and not "mm/dd/yy".
There are two ways to get the date format for a cell.
You start by grabbing the "s" or StyleIndex. Note the date in numeric raw format below (40667):
<row r="1">
<c r="A1" s="1">
<v>40667</v>
</c>
</row>
The "s" attribute in the cells nodes points to a zero-based array of styles.xml nodes starting at 0. This is the key to locating the date format, if any, that maps to the raw numeric date data. You see s=1, that points to the 2nd xf node in the following cell formatting styles.xml section of your Excel workbook:
<cellXfs count="2">
<xf numFmtId="0" ... />
<xf numFmtId="14" ... />
</cellXfs>
In the second node you see the numFmtId="14" value. That is the numberFormatID. It tells you that that is the id needed to determine what your date number should be presented in. But that number points to two possible places for the date format. If its number is in the range 14-22 its a built in style for date. If its outside that range its (possibly) a custom date format added by the excel file owner. You wont know until you check both places.
In the first case, if its a value 14-22, you will need to map it to one of the pre-built date formats every excel file has (mm-dd-yy, etc.). You can locate that table in the OpenXML SDK. Here is a sample of those with the numFmtId mapped to the built-in date formats....
14 mm-dd-yy
15 d-mmm-yy
16 d-mmm
17 mmm-yy
18 h:mm AM/PM
At this point you know its a date and what format its to be presented in. If its not one of those values, its likely a custom number. And you now have to search the styles.xml file again for a style node with a matching numFmtId value. Those nodes will contain the custom date format as follows:
<numFmts count="2">
<numFmt numFmtId="164" formatCode="mm/yyyy;@" />
<numFmt numFmtId="165" formatCode="0.000" />
<numFmt numFmtId="166" formatCode="#,##0.000" />
</numFmts>
Note that if your numFmtId was 164, you found its custom date format. So to catch all these crazy date formats, custom and built in, your best bet is to maintain a range of acceptable "formats" as strings, locate your formatCode, then see if it matches one of the acceptable ones in your code.
Good Luck!
精彩评论