开发者

C# - Regular Expression to split string on spaces, unless a double quote is encountered

开发者 https://www.devze.com 2023-02-16 00:50 出处:网络
This thread is very similar to what I want: Regular Expression to split on spaces unless in quotes But I need a few extra rules that I cannot figure out:

This thread is very similar to what I want: Regular Expression to split on spaces unless in quotes

But I need a few extra rules that I cannot figure out: - the above thread does split on spaces, unless they're in double quotes. However, it splits on punctuation as well. I need Anything inside the double quotes to remain as one entity.

For exam开发者_如何学Cple:

/Update setting0 value="new value" /Save should return

/Update

setting0

value=

new value (I don't care whether it trims the quotes off or not)

/Save

/Import "C:\path\file.xml" "C:\path_2\file_2.xml" /Exit should return

/Import

C:\path\file.xml (I don't care whether it trims the quotes off or not)

C:\path_2\file_2.xml

/Exit

I ended up using this expression from the thread above:

(?<=")\w[\w\s]*(?=")|\w+|"[\w\s]*"

Could someone please help me tweak it? Thanks!


I haven't tried it in C# but VBA in Excel but it might be helpful. I have also changed double to single quotea. Anyway, here is the regexp

Text:

/Update setting0 value='new value' /Save

Regexp:

('{1}(\w|\s|:|\\|\.)+'{1}|\w)+

Result:

Update

setting0

value

'new value'

Save

Text:

/Import 'C:\path\file.xml' 'C:\path_2\file_2.xml' /Exit

Result:

Import

'C:\path\file.xml'

'C:\path_2\file_2.xml'

Exit


This is a problem that cannot in general be solved using regular expressions. Instead, you can write a simple parser which takes a line, reading each character, then when it encounters a space and not being inside a quote, it takes the current substring and adds it to a list:

public static string[] ParseLine(string line)
        {
            var insideQuotes = false;

            var parts = new List<string>();

            var j = 0;

            for (var i = 0; i < line.Length; i++)
            {
                switch (line[i])
                {
                    case '"':
                         insideQuotes = !insideQuotes;
                         break;
                    case ' ':
                         if (!insideQuotes)
                         {
                             parts.Add(line.Substring(j, i - j));
                             j = i + 1;
                         }
                         break;
                    default:
                         continue;
                }
            }

            return parts.ToArray();
        }

Note however that this won't handle like escaped quotes inside quotes.


This one works if there is even number of double quotes and no escaped quotes:

^
\s*
(?:
    (?:
        ([^\s"]+)
        |
        "([^"]*)"
    )
    \s*
)+
$


var matches = Regex.Matches("/Update setting0 value=\"new value\" /Save", "\\G(?:(\"[^\"]*\"?|[^ \"]+)|[ ]+)");

foreach (Match match in matches) {
    foreach (Capture capture in match.Groups[1].Captures) {
        Console.WriteLine(capture);
    }
}

If you want to not have the quotes (so "new value" becomes new value)

var matches = Regex.Matches("/Update setting0 value=\"new value\" /Save", "\\G(?:\"(?<1>[^\"]*)\"?|(?<1>[^ \"]+)|[ ]+)");

The ? after the second \" is to catch unclosed quotes.


Just my modified version of what eulerfx posted. This one:

Should produce the results requested in the original question (so is "on topic").

Doesn't include quotes in the results

Doesn't include white-space only in results

Splits results on any white-space not inside quotes

Handles missing end-quote by just adding whatever is left-over after loop

Trims results, unless inside quotes.

I mostly made this for parsing the last 2 parts of each line of an IMAP list result.

    public static string[] ParseLine(string line)
    {
        var insideQuotes = false;
        var start = -1;

        var parts = new List<string>();

        for (var i = 0; i < line.Length; i++)
        {
            if (Char.IsWhiteSpace(line[i]))
            {
                if (!insideQuotes)
                {
                    if (start != -1)
                    {
                        parts.Add(line.Substring(start, i - start));
                        start = -1;
                    }
                }
            }
            else if (line[i] == '"')
            {
                if (start != -1)
                {
                    parts.Add(line.Substring(start, i - start));
                    start = -1;
                }
                insideQuotes = !insideQuotes;
            }
            else
            {
                if (start == -1)
                    start = i;
            }
        }

        if (start != -1)
            parts.Add(line.Substring(start));

        return parts.ToArray();
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消