开发者

C# Regex.Split: Removing empty results

开发者 https://www.devze.com 2023-02-08 15:49 出处:网络
I am working on an application which imports thousands of lines where every line has a format like this:

I am working on an application which imports thousands of lines where every line has a format like this:

|* 9070183020  |04.02.2011    |107222     |M/S SUNNY MEDICOS                  |GHAZIABAD                          |      32,768.00 |

I am using the following Regex to split the lines to the data I need:

Regex lineSplitter = new Regex(@"(?:^\|\*|\|)\s*(.*?)\s+(?=\|)");
string[] columns = lineSplitter.Split(data);

foreach (string c in columns)
    Console.Write("[" + c + "] ");

This is giving me the following result:

[] [9070183020] [] [04.02.2011] [] [107222] [] [M/S SUNNY MEDICOS] [] [GHAZIABAD] [] [32,768.00] [|]

Now I have two questions.

1. How do I remove the empty results. I know I can use:

string[] columns = lineSplitter.Split(data).Where(s => !string.IsNullOrEmpty(s)).ToArray();

but is there any built in method to remove the empty results?

2. How can I remove the last pipe?

Thanks for any help.

Regards,

Yogesh.

EDIT:

I think my question was a little misunderstood. It was never about how I can do it. It was only about how can I do it by changing the Regex in the above code.

I know that I can do it 开发者_开发知识库in many ways. I have already done it with the code mentioned above with a Where clause and with an alternate way which is also (more than two times) faster:

Regex regex = new Regex(@"(^\|\*\s*)|(\s*\|\s*)");
data = regex.Replace(data, "|");

string[] columns = data.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);

Secondly, as a test case, my system can parse 92k+ such lines in less than 1.5 seconds in the original method and in less than 700 milliseconds in the second method, where I will never find more than a couple of thousand in real cases, so I don't think I need to think about the speed here. In my opinion thinking about speed in this case is Premature optimization.

I have found the answer to my first question: it cannot be done with Split as there is no such option built in.

Still looking for answer to my second question.


Regex lineSplitter = new Regex(@"[\s*\*]*\|[\s*\*]*");
var columns = lineSplitter.Split(data).Where(s => s != String.Empty);

or you could simply do:

string[] columns = data.Split(new char[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string c in columns) this.textBox1.Text += "[" + c.Trim(' ', '*') + "] " + "\r\n";

And no, there is no option to remove empty entries for RegEx.Split as is for String.Split.

You can also use matches.


Don't use a regex at all in your case. It doesn't seem you need one and regexes are much slower (and have a much higher overhead) than directly using the string functions.

So use somewhat like:

const Char[] splitChars = new Char[] {'|'};

string[] splitData = data.Split(splitChars, StringSplitOptions.RemoveEmptyEntries)


I think this may work as an equivalent to remove empty strings:

string[] splitter = Regex.Split(textvalue,@"\s").Where(s => s != String.Empty).ToArray<string>();


As an alternative to splitting, which is always going to cause trouble when your delimiters are also present at the beginning and end of the input, you can try matching the contents within the pipes:

foreach (var token in Regex.Matches(input, @"\|\*?\s*(\S[^|]*?)\s*(?=\|)"))
{
    Console.WriteLine("[{0}]", token.Groups[1].Value);
}

// Prints the following:
// [9070183020]
// [04.02.2011]
// [107222]
// [M/S SUNNY MEDICOS]
// [GHAZIABAD]
// [32,768.00]


I might have the wrong idea here, but you just want to split the data string using the '|' character as a delimiter? In that case you couldtry:

string[] result = data.Split(new[] { "|" }, StringSplitOptions.RemoveEmptyEntries).Select(d => d.Trim()).ToArray();

This will return all the fields, without spaces and with empty fields removed. You can what you like in the Select part to format the results e.g.

.Select(d => "[" + d.Trim() + "]").ToArray();


Based on @Jaroslav Jandek's great answer, I wrote an extension method, I put that here, maybe it can save your time.

/// <summary>
/// String.Split with RemoveEmptyEntries option for clean up empty entries from result
/// </summary>
/// <param name="s">Value to parse</param>
/// <param name="separator">The separator</param>
/// <param name="index">Hint: pass -1 to get Last item</param>
/// <param name="wholeResult">Get array of split value</param>
/// <returns></returns>
public static object CleanSplit(this string s, char separator, int index, bool wholeResult = false)
{
    if (string.IsNullOrWhiteSpace(s)) return "";

    var split = s.Split(new char[] { separator }, StringSplitOptions.RemoveEmptyEntries);

    if (wholeResult) return split;

    if (index == -1) return split.Last();

    if (split[index] != null) return split[index];

    return "";
}


1. How do I remove the empty results?

You can use LINQ to remove all entries that are equal to string.Empty :

string[] columns = lineSplitter.Split(data); 
columns = columns.ToList().RemoveAll(c => c.Equals(string.Empty)).ToArray();

2. How can I remove the last pipe?

You can use LINQ here to remove all the entries equal to the character you want to remove :

columns = columns.ToList().RemoveAll(c => c.Equals("|")).ToArray();


How about this:

assuming we have a line:

line1="|* 9070183020  |04.02.2011    |107222     |M/S SUNNY MEDICOS                  |GHAZIABAD                          |      32,768.00 |";

we can have required result as:

string[] columns =Regex.Split(line1,"|");
foreach (string c in columns)
         c=c.Replace("*","").Trim();

This will give following result:

[9070183020] [04.02.2011] [107222] [M/S SUNNY MEDICOS] [GHAZIABAD] [32,768.00]


use this solution:

string stringwithDelemeterNoEmptyValues= string.Join(",", stringwithDelemeterWithEmptyValues.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries));
0

精彩评论

暂无评论...
验证码 换一张
取 消