Split the string with different conditions using Linq in C#_问答_开发者

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.

Some Examples:

"this is test A/ABC"
Expected output: "this is test A" and "ABC"
"this is a test; ABC/XYZ"
Expected output: "this is a test; ABC" and "XYZ"
"This TASK is assigned to ANIL/SHAM in our project"
Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"
"This TASK is assigned to ANIL/SHAM in OUR project"
Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"
"this is test AWN.A"
Expected output: "this is test" and "AWN.A"
"XETRA-DAX" Expected output: "XETRA" and "DAX"
"FTSE-100" Expected output: "-100" and "FTSE"
"A开发者_运维百科THEX" Expected output: "" and "ATHEX"
"Euro-Stoxx-50" Expected output: "Euro-Stoxx-50" and ""

How can I achieve that?

An "intelligent" version:

    string strValue = "this is test A/ABC";
    int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
    var str1 = strValue.Substring(0, ix);
    var str2 = strValue.Substring(ix + 1);

A "stupid LINQ" version:

    var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
    var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());

both cases are WITHOUT checks. The OP can add checks if he wants them.

For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".

var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");

var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");

For the third question

var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);

var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");

With code sample: http://ideone.com/5OSs0

Another update (it's becoming BORING)

Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);

var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");

The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ

With code sample: http://ideone.com/FqcmY

This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:

Match lastSeparator = Regex.Match(strExample,
                                  @"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
                                  RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'");  // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator

This regex is a little tricky. Main tricks:

Use RegexOptions.RightToLeft to find the last match.
Use of Match.Result for a replace.
$`$' as replacement string: http://www.regular-expressions.info/refreplace.html
\p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.

If the word shouldn't follow an upper case word, you can simplify the regex to:
```
@"[-/ ;(](\p{Lu}+)\b"  
```
If you want other characters as well, you can use a character class (and maybe remove \b). For example:
```
@"[-/ ;(]([\p{Lu}.,]+)"
```

Working example: http://ideone.com/U9AdK

use a List of strings, set all the words to it

find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.

in the below sentence of yours your index of / will be 6.

string strSentence ="This TASK is assigned to ANIL/SHAM in our project";

then use ElementAt(6) at the end of

index is the index of the / in your List<string>

str = str.Select(s => strSentence.ElementAt(index+1)).ToList();

this will return you the SHAM

str = str.Delete(s => strSentence.ElementAt(index+1));

this will delete the SHAM then just print the strSentence without SHAM

if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.

the idea of mine is right i think but the code may not be that flawless.

You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.

As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile

    string strValue  = "this is test A/ABC";
    var s1=new string(
        strValue
        .TakeWhile(c => c!= '/')
        .ToArray());
    var s2=new string(
        strValue
        .SkipWhile(c => c!= '/')
        .Skip(1)
        .ToArray());

I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq