开发者

Best way to split a string into tokens skipping escaped delimiters?

开发者 https://www.devze.com 2022-12-20 06:08 出处:网络
I\'m receiving an NSString which uses commas as delimiters, and a backslash as an escape character. I was looking into splitting the string using componentsSeparatedByString, but I found no way to spe

I'm receiving an NSString which uses commas as delimiters, and a backslash as an escape character. I was looking into splitting the string using componentsSeparatedByString, but I found no way to specify the escape character. Is there a built-in way to do this? NSScanner? CFStringTokenizer?

If not, would it be better to split the string at the commas, and then rejoin tokens that were falsely split (after inspecting them for a (non-开发者_如何学JAVAescaped) escape character at the end) or looping through each character trying to find a comma, and then looking back one character to see if the comma is escaped or not (and then one more character to see if the escape character is escaped).

Now that I think about it, I would need to check that the amount of escape characters before a delimiter is even, because only then is the delimiter itself not being escaped.

If someone has a method that does this, I'd appreciate it if I could take a look at it.


I think the most straightforward method to do this would be to go through the string character by character as you suggest, appending into new string objects. You can follow two simple rules:

  1. if you find a backslash, ignore but copy the next character (if exists) unconditionally
  2. if you find a comma, end of that section

You could do this manually or use some of the functionality of NSScanner to help you (scanUpToCharactersFromSet:intoString:)


I would prefer to use a regular expression based parser to weed out the escape characters and then possibly doing a split operation (of some type) on the string.


Okay, (I hope) this is what wipolar suggested. It's the first implementation that works. I've just started with a non-GC-collected language, so please post a comment if you think this code can be improved, especially in the memory-management department.

- (NSArray *) splitUnescapedCharsFrom: (NSString *) str atChar: (char) delim withEscape: (char) esc
{
    NSMutableArray * result = [[NSMutableArray alloc] init];
    NSMutableString * currWord = [[NSMutableString alloc] init];

    for (int i = 0; i < [str length]; i++)
    {
        if ([str characterAtIndex:i] == esc)
        {
            [currWord appendFormat:@"%c", [str characterAtIndex:++i]];
        }
        else if ([str characterAtIndex:i] == delim)
        {
            [result addObject:[NSString stringWithString:currWord]];
            [currWord release];
            currWord = [[NSMutableString alloc] init];
        }
        else
        {
            [currWord appendFormat:@"%c", [str characterAtIndex:i]];
        }
    }

    [result addObject:[NSString stringWithString:currWord]];
    [currWord release];

    return [NSArray arrayWithArray:result];
}
0

精彩评论

暂无评论...
验证码 换一张
取 消