开发者

Split the string with different conditions using Linq in C#

开发者 https://www.devze.com 2023-02-16 10:29 出处:网络
I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.

Some Examples:

  1. "this is test A/ABC"

    Expected output: "this is test A" and "ABC"

  2. "this is a test; ABC/XYZ"

    Expected output: "this is a test; ABC" and "XYZ"

  3. "This TASK is assigned to ANIL/SHAM in our project"

    Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"

  4. "This TASK is assigned to ANIL/SHAM in OUR project"

    Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"

  5. "this is test AWN.A"

    Expected output: "this is test" and "AWN.A"

  6. "XETRA-DAX" Expected output: "XETRA" and "DAX"

  7. "FTSE-100" Expected output: "-100" and "FTSE"

  8. "A开发者_运维百科THEX" Expected output: "" and "ATHEX"

  9. "Euro-Stoxx-50" Expected output: "Euro-Stoxx-50" and ""

How can I achieve that?


An "intelligent" version:

    string strValue = "this is test A/ABC";
    int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
    var str1 = strValue.Substring(0, ix);
    var str2 = strValue.Substring(ix + 1);

A "stupid LINQ" version:

    var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
    var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());

both cases are WITHOUT checks. The OP can add checks if he wants them.

For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".

var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");

var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");

For the third question

var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);

var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");

With code sample: http://ideone.com/5OSs0

Another update (it's becoming BORING)

Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);

var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");

The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ

With code sample: http://ideone.com/FqcmY


This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:

Match lastSeparator = Regex.Match(strExample,
                                  @"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
                                  RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'");  // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator

This regex is a little tricky. Main tricks:

  • Use RegexOptions.RightToLeft to find the last match.
  • Use of Match.Result for a replace.
  • $`$' as replacement string: http://www.regular-expressions.info/refreplace.html
  • \p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.

  • If the word shouldn't follow an upper case word, you can simplify the regex to:

    @"[-/ ;(](\p{Lu}+)\b"  
    
  • If you want other characters as well, you can use a character class (and maybe remove \b). For example:

    @"[-/ ;(]([\p{Lu}.,]+)"
    

Working example: http://ideone.com/U9AdK


use a List of strings, set all the words to it

find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.

in the below sentence of yours your index of / will be 6.

string strSentence ="This TASK is assigned to ANIL/SHAM in our project"; 

then use ElementAt(6) at the end of

index is the index of the / in your List<string>

str = str.Select(s => strSentence.ElementAt(index+1)).ToList();

this will return you the SHAM

str = str.Delete(s => strSentence.ElementAt(index+1));

this will delete the SHAM then just print the strSentence without SHAM

if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.

the idea of mine is right i think but the code may not be that flawless.


You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.


As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile

    string strValue  = "this is test A/ABC";
    var s1=new string(
        strValue
        .TakeWhile(c => c!= '/')
        .ToArray());
    var s2=new string(
        strValue
        .SkipWhile(c => c!= '/')
        .Skip(1)
        .ToArray());

I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

0

精彩评论

暂无评论...
验证码 换一张
取 消