开发者

string encoding in C# - strange characters

开发者 https://www.devze.com 2023-04-13 09:37 出处:网络
I have a file that i need to import. The problem is that I have problems with a lot of characters in that file.

I have a file that i need to import. The problem is that I have problems with a lot of characters in that file.

For example these names are wrong:

Björn (in file) - Should be Björn

Ã…ke (i开发者_运维知识库n file) - Should be Åke

Unfortunately I can't recreate the file with the correct encoding. Also there are a lot of characters that are wrong (these was just examples). I can't do a search and replace on all (if there isn't a dictionary with all conversions).

Can I decode the strings in some way?

thanks Patrik

Edit: Just some more info that I should added before (I blame my tiredness). The file is an .xlsx file.


I debugged this with Notepad++. I copied the correct strings into Notepad++. I used Encoding | Convert to UTF-8. Then I selected Encoding | Encode as ANSI. This has the effect of interpreting the UTF-8 bytes as if they were ANSI. And when I did this I end up with the same erroneous values as you. So clearly when you read the file you are interpreting is as ANSI rather than UTF-8.

The solution then is that your file has been encoded as UTF-8. Make sure that the file is interpreted as UTF-8 when you read it. I can't tell you exactly how to do that since you didn't show how you were reading the file in the first place.

It's possible that your file does not contain a byte-order-mark (BOM). If so then specify the encoding when you read the file by passing Encoding.UTF8.


I've just tried your first example, and it definitely looks like that's UTF-8.

It's unclear what you're using to look at the file in the first place, but if you load it with a text editor which understands UTF-8 and tell it that it's a UTF-8 file, it should be fine.

When you load it with .NET, you should just be able to use File.OpenText, File.ReadAllText etc - most IO dealing with encodings in .NET defaults to UTF-8 anyway.

0

精彩评论

暂无评论...
验证码 换一张
取 消