I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /
, ;
, (
, -
or a space.
Some Examples:
Expected output:"this is test A/ABC"
"this is test A"
and"ABC"
Expected output:"this is a test; ABC/XYZ"
"this is a test; ABC"
and"XYZ"
Expected output:"This TASK is assigned to ANIL/SHAM in our project"
"This TASK is assigned to ANIL in our project"
and"SHAM"
Expected output:"This TASK is assigned to ANIL/SHAM in OUR project"
"This TASK is assigned to ANIL/SHAM in project"
and"OUR"
Expected output:"this is test AWN.A"
"this is test"
and"AWN.A"
"XETRA-DAX"
Expected output:"XETRA"
and"DAX"
"FTSE-100"
Expected output:"-100"
and"FTSE"
"A开发者_运维百科THEX"
Expected output:""
and"ATHEX"
"Euro-Stoxx-50"
Expected output:"Euro-Stoxx-50"
and""
How can I achieve that?
An "intelligent" version:
string strValue = "this is test A/ABC";
int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
var str1 = strValue.Substring(0, ix);
var str2 = strValue.Substring(ix + 1);
A "stupid LINQ" version:
var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());
both cases are WITHOUT checks. The OP can add checks if he wants them.
For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".
var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");
var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");
For the third question
var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);
var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");
With code sample: http://ideone.com/5OSs0
Another update (it's becoming BORING)
Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");
The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ
With code sample: http://ideone.com/FqcmY
This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:
Match lastSeparator = Regex.Match(strExample,
@"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'"); // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator
This regex is a little tricky. Main tricks:
- Use
RegexOptions.RightToLeft
to find the last match. - Use of Match.Result for a replace.
$`$'
as replacement string: http://www.regular-expressions.info/refreplace.html\p{Lu}
for upper-case letters, you can change that to[A-Z]
if your more comfortable with that.
If the word shouldn't follow an upper case word, you can simplify the regex to:
@"[-/ ;(](\p{Lu}+)\b"
If you want other characters as well, you can use a character class (and maybe remove
\b
). For example:@"[-/ ;(]([\p{Lu}.,]+)"
Working example: http://ideone.com/U9AdK
use a List of strings, set all the words to it
find the index of the /
then use ElementAt()
to determine the word to split which is "SHAM" in your question.
in the below sentence of yours your index of /
will be 6.
string strSentence ="This TASK is assigned to ANIL/SHAM in our project";
then use ElementAt(6)
at the end of
index
is the index of the /
in your List<string>
str = str.Select(s => strSentence.ElementAt(index+1)).ToList();
this will return you the SHAM
str = str.Delete(s => strSentence.ElementAt(index+1));
this will delete the SHAM then just print the strSentence without SHAM
if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.
the idea of mine is right i think but the code may not be that flawless.
You can use a combination of the string.Split()
method and the Regex
class. A simple Split
is suitable for simple cases, such as splitting according to the character /
. Regular expressions are perfect for matching more complicated patterns.
As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile
string strValue = "this is test A/ABC";
var s1=new string(
strValue
.TakeWhile(c => c!= '/')
.ToArray());
var s2=new string(
strValue
.SkipWhile(c => c!= '/')
.Skip(1)
.ToArray());
I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq
精彩评论