开发者

C# Regex.Split - Would like to split a string by a char but allow the char to be used between ' and '

开发者 https://www.devze.com 2023-01-28 02:21 出处:网络
I would like to split a string by / but also allow anything between \' and \' to contain my split character. While removing the \' and \' from the split and any blanks.

I would like to split a string by / but also allow anything between ' and ' to contain my split character. While removing the ' and ' from the split and any blanks.

example /'TEST/:1'/3 is split into

  • Test/:1
  • 3

I have something close:

regex.split(test, "/(?=(?:[^']*'[^']*')*[^']*$)" 

but it still has the ' and blanks in it.

string[] fields = Regex.Split(Path, "/(?=(?:[^']*'[^']*')*[^']*$)");

Sorry I well be more clear its my own pathing format for parsing EDI like file it works like this. /Location:LineNumber/YPosition. All I开发者_运维知识库 want to do so is get in a command line /TEST:4/6 = /Location(Test):LineNumber(4)/YPosition(6). but my problem is that the Location is worked out by my code and could be anything up to the first delimiter. It could be a date format 12/03/03 which is common in EDI type files.

I would like to be able to get my command looking like this /'12/03/03':4/6 so that when the Location is split by my regex.split it does not include anything between the (' and ') location.

Any help please?


It is not entirely clear what you're really trying to achieve. If you want to parse a CSV-like file format, better use a CSV parser; if you want to manipulate file paths, then use your standard library's path manipulation functions. Both are better equipped to deal with edge cases than a simple regex.

That said, you could do

MatchCollection allMatchResults = null;
Regex regexObj = new Regex(
    @" "" [^""]* ""  # either match a (double-)quoted string
    |                # or
    [^/\s]*          # a sequence of characters other than / or whitespace", 
    RegexOptions.IgnorePatternWhitespace);
allMatchResults = regexObj.Matches(subjectString);

This will not handle escaped quotes within quoted strings, but if those aren't present in your input, it should work.

Note that "Test / 1"/1/2 3/45 will be split into "Test / 1", 1, 2, 3 and 45 because blanks are removed and thereby also serve as separators.


class Program
    {
        static void Main(string[] args)
        {
            string input = "\"\"/one/\"two/threefour\"/five//";

            var separator = "/";

            var empty = "(?<empty>\\s*)";

            var word = string.Format("(?<word>[^{0}]+)",  separator);
            var quotedWord = "(?<word>[^\\\"]+)";

            var token = string.Format("((\"({0}|{1})\")|({0}|{2}))",empty, quotedWord, word);

            var pattern = string.Format("({0}/)+({0})", token);

            foreach (var capture in Regex.Match(input, pattern).Groups["word"].Captures)
                Console.WriteLine(capture);

        }


    }


Split is not really the right tool for this kind of job. If you were only splitting on slashes it would be fine, but with the addition of optional quotes, it becomes much easier to match the fields rather than the delimiters. Here's a regex that allows for single- or double-quoted strings and treats /, :, and whitespace as delimiters:

  Regex r1 = new Regex(@"
      ""(?<FIELD>[^""]*)""
      |
      '(?<FIELD>[^']*)'
      |
      (?<FIELD>[^""'/:\s]+)
  ", RegexOptions.IgnorePatternWhitespace);

  string[] source = { @"/TEST:4/6", @"/'12/03/03':4/6" };
  foreach (string s in source)
  {
    foreach (Match m in r1.Matches(s))
    {
      Console.WriteLine(m.Groups["FIELD"].Value);
    }
    Console.WriteLine();
  }

output:

TEST
4
6

12/03/03
4
6

Reusing the FIELD group name makes it easy to scrape off the quotes if the field has them. Here's a more "semantic" approach that matches the whole string in one pass and uses named groups to pigeonhole the field values. I just plug them into the replacement string, but you can access them with the Groups method as I did in the first example.

  Regex r2 = new Regex(@"
      / \s*
        (?:
           (?<Q>[""'])(?<LOC>(?:(?!\<Q>).)*)\1
         |
           (?<LOC>[^""'/:]+)
        ) \s*
      : \s* (?<LINE>\d+) \s*
      / \s* (?<YPOS>\d+)
    ", RegexOptions.IgnorePatternWhitespace);

  foreach (string s in source)
  {
    Console.WriteLine(r2.Replace(s,
        @"Location(${LOC}):LineNumber(${LINE})/YPosition(${YPOS})")
    );
    Console.WriteLine();
  }

output:

Location(TEST):LineNumber(4)/YPosition(6)

Location(12/03/03):LineNumber(4)/YPosition(6)

This regex doesn't treat whitespace as a delimiter, but it allows for whitespace padding around the field values. It also matches single- and double-quoted fields with the same construct. That's not really worth the extra complexity in this case, but it's a technique that's worth knowing about.

0

精彩评论

暂无评论...
验证码 换一张
取 消