开发者

"NSString stringWithUTF8String:" is overly touchy

开发者 https://www.devze.com 2023-03-10 18:24 出处:网络
I\'m in the middle of doing some string manipulation using high-level Cocoa features like NSString and NSData as opposed to digging down to C-level things like working on arrays of chars.

I'm in the middle of doing some string manipulation using high-level Cocoa features like NSString and NSData as opposed to digging down to C-level things like working on arrays of chars.

For the love of it, +[NSString stringWithUTF8String:]sometimes returns nil on a perfectly good string that was created with -[NSString UTF8String] in the first place. One开发者_JAVA技巧 would assume that this happens when the input is malformed. Here is an example of the input that fails, in hex:

55 6B 66 51 35 59 4A 5C 6A 60 40 33 5F 45 58 60 9D 47 3F 6E 5E 
60 59 34 58 68 41 4B 61 4E 3F 41 46 00

and ASCII:

UkfQ5YJ\j`@3_EX`G?n^`Y4XhAKaN?AF

This is a randomly generated string, to test my subroutine.

char * buffer = [randomNSString UTF8String];
// .... doing things .... in the end, buffer is the same as before
NSString * result = [NSString stringWithUTF8String:buffer];
// yields nil

Edit: Just in case somebody didn't grasp the implicit question, here it is in -v mode:

Why does [NSString stringWithUTF8String:] sometimes return nil on a perfectly formed UTF8-String?


walkytalky is right. 9d is not legal in utf8 in this way. utf8 bytes with the top bits 10 are reserved as continuation characters, they never appear without a prefix character with more than one leading bit.


This is a bit of a stab in the dark because we don't have enough information to properly diagnose the problem.

If randomNSString no longer exists at the point where you allocate the memory for result, for instance, if it has been released in a reference counted environment or collected in a GC environment, it is possible that buffer points to memory that has been freed but not yet reused (which would explain why it is still the same).

However, creating a new NSString requires allocation of memory and it might use the block pointed to by buffer which would mean your UTF8 string would get zapped by the internals of the new NSString. You can test this theory by loggin the contents of buffer after failing to create result. Don't use the %s specifier though, print the hex bytes.

0

精彩评论

暂无评论...
验证码 换一张
取 消