开发者

C# regex.split method is adding empty string before parenthesis

开发者 https://www.devze.com 2023-03-18 23:13 出处:网络
I have some code that tokenizes a equation input into a string array: string infix = \"( 5 + 2 ) * 3 + 4\";

I have some code that tokenizes a equation input into a string array:

string infix = "( 5 + 2 ) * 3 + 4";
string[] tokens开发者_JS百科 = tokenizer(infix, @"([\+\-\*\(\)\^\\])");
foreach (string s in tokens)
{
   Console.WriteLine(s);
}

Now here is the tokenizer function:

public string[] tokenizer(string input, string splitExp)
        {
            string noWSpaceInput = Regex.Replace(input, @"\s", "");
            Console.WriteLine(noWSpaceInput);
            Regex RE = new Regex(splitExp);
            return (RE.Split(noWSpaceInput));
        }

When I run this, I get all characters split, but there is an empty string inserted before the parenthesis chracters...how do I remove this?

//empty string here

(

5

+

2

//empty string here

)

*

3

+

4


I would just filter them out:

public string[] tokenizer(string input, string splitExp)
{
    string noWSpaceInput = Regex.Replace(input, @"\s", "");
    Console.WriteLine(noWSpaceInput);
    Regex RE = new Regex(splitExp);
    return (RE.Split(noWSpaceInput)).Where(x => !string.IsNullOrEmpty(x)).ToArray();
}


What you're seeing is because you have nothing then a separator (i.e. at the beginning of the string is(), then two separator characters next to one another (i.e. )* in the middle). This is by design.

As you may have found with String.Split, that method has an optional enum which you can give to have it remove any empty entries, however, there is no such parameter with regular expressions. In your specific case you could simply ignore any token with a length of 0.

foreach (string s in tokens.Where(tt => tt.Length > 0))
{
   Console.WriteLine(s);
}


Well, one option would be to filter them out afterwards:

return RE.Split(noWSpaceInput).Where(x => !string.IsNullOrEmpty(x)).ToArray();


Try this (if you don't want to filter the result):

tokenizer(infix, @"(?=[-+*()^\\])|(?<=[-+*()^\\])");

Perl demo:

perl -E "say join ',', split /(?=[-+*()^])|(?<=[-+*()^])/, '(5+2)*3+4'"
(,5,+,2,),*,3,+,4

Altho it would be better to use a match instead of split in this case imo.


I think you can use the [StringSplitOptions.RemoveEmptyEntries] by the split

    static void Main(string[] args)
    {
        string infix = "( 5 + 2 ) * 3 + 4";
        string[] results = infix.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
        foreach (var result in results)
            Console.WriteLine(result);

        Console.ReadLine();
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消