I'm fetching data from an XML source and parsing through it with tbxml. Everything is working fine until I get to a latin letter like the "é" it will display as: Code:
é
I d开发者_StackOverflowon't see a proper method of NSString to do the conversion. Any ideas?
You can use a regex. A regex is a solution to, and cause of, all problems! :)
The example below uses, at least as of this writing, the unreleased RegexKitLite 4.0. You can get the 4.0 development snapshot via svn:
shell% svn co http://regexkit.svn.sourceforge.net/svnroot/regexkit regexkit
The examples below take advantage of the new 4.0 Blocks feature to do a search and replace of the é
character entities.
This first example is the "simpler" of the two. It only handles decimal character entities like é
and not hexadecimal character entities like é
. If you can guarantee that you'll never have hexadecimal character entities, this should be fine:
#import <Foundation/Foundation.h>
#import "RegexKitLite.h"
int main(int argc, char *charv[]) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSString *string = @"A test: é and é ? YAY! Even >0xffff are handled: 𝐀 or 𝐀, see? (0x1d400 == MATHEMATICAL BOLD CAPITAL A)";
NSString *regex = @"&#([0-9]+);";
NSString *replacedString = [string stringByReplacingOccurrencesOfRegex:regex usingBlock:^NSString *(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop) {
NSUInteger u16Length = 0UL, u32_ch = [capturedStrings[1] integerValue];
UniChar u16Buffer[3];
if (u32_ch <= 0xFFFFU) { u16Buffer[u16Length++] = ((u32_ch >= 0xD800U) && (u32_ch <= 0xDFFFU)) ? 0xFFFDU : u32_ch; }
else if (u32_ch > 0x10FFFFU) { u16Buffer[u16Length++] = 0xFFFDU; }
else { u32_ch -= 0x0010000UL; u16Buffer[u16Length++] = ((u32_ch >> 10) + 0xD800U); u16Buffer[u16Length++] = ((u32_ch & 0x3FFUL) + 0xDC00U); }
return([NSString stringWithCharacters:u16Buffer length:u16Length]);
}];
NSLog(@"replaced: '%@'", replacedString);
return(0);
}
Compile and run with:
shell% gcc -arch i386 -g -o charReplace charReplace.m RegexKitLite.m -framework Foundation -licucore
shell% ./charReplace
2010-02-13 22:51:48.909 charReplace[35527:903] replaced: 'A test: é and é ? YAY! Even >0xffff are handled:
精彩评论