开发者

Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

开发者 https://www.devze.com 2023-04-05 16:04 出处:网络
I am using the JExcel library to read excel spreadsheets.Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc).

I am using the JExcel library to read excel spreadsheets. Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc). Today I don't tell the API anything regarding the encoding its supposed to use. Its handling the Chinese OK, but it always screws up Portugese and German. Somehow the default encoding (MacRoman on my dev box, UTF-8 on production) is failing to properly interpret the strings it pulls out of the excel workbook. There has to be something wrong with how JExcel is interpreting the character encoding of the file.

That being said...

Are all the strings in an excel workbook encoded with the same character set?

Is there workbook meta-data I can ask what this character set is (I haven't found it yet)?

If I run all the cells through something like jchardet (http://jchardet.sourceforge.net/), is it like开发者_JAVA技巧ly to be able to divine the character encoding for the whole workbook (this is pretty much predicated on the first question being "yes, all stings in a given workbook are encoded with the same character set")?

So many questions, so little time.


Well I didn't get an answer directly, but Matt's discovery of a spec points the way towards an actual answer: http://sc.openoffice.org/excelfileformat.pdf

In the mean time, my problem went away by just setting the encoding to always be "Cp1252". I'm not sure exactly why, but I'm not looking a gift horse in the mouth, so to speak, and am moving on.

    WorkbookSettings workbookSettings = new WorkbookSettings();
    workbookSettings.setEncoding( "Cp1252" );
    Workbook.getWorkbook( theFile, workbookSettings );

I'll call this one answered.


I have the problem that, while reading cell values from the excel file, some values appeared with "?" as this corresponds to letters with accent... Would that code resolve this issue ?. Because as I am running under windows, I cannot test as fast as If I would be under Linux (which is the SO of the server where I'm deploying to)...

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号