I have a regular expression to validate a string. But now I want to remove all th开发者_Python百科e characters that do not match my regular expression.
E.g.
regExpression = @"^([\w\'\-\+])"
text = "This is a sample text with some invalid characters -+%&()=?";
//Remove characters that do not match regExp.
result = "This is a sample text with some invalid characters -+";
Any ideas of how I can use the RegExpression to determine the valid characters and remove all the other ones.
Many thanks
I believe you can do this (whitelist characters and replace everything else) in one line:
var result = Regex.Replace(text, @"[^\w\s\-\+]", "");
Technically it will produce this: "This is a sample text with some invalid characters - +" which is slightly different than your example (the extra space between the - and +).
Simple as that:
var match = Regex.Match(text, regExpression);
string result = "";
if(match.Success)
result = match.Value;
Removing the non-matched characters is the same as keeping the matched ones. That's what we are doing here.
If it is possible that the expression matches multiple times in your text, you can use this:
var result = Regex.Matches(text, regExpression).Cast<Match>()
.Aggregate("", (s, e) => s + e.Value, s => s);
Thanks to Replace chars if not match answer I've created a helper method to strips unaccepted characters .
The allowed pattern should be in Regex format, expect them wrapped in square brackets. A function will insert a tilde after opening squere bracket. I anticipate that it could work not for all RegEx describing valid characters sets,but it works for relatively simple sets, that we are using.
/// <summary>
/// Replaces not expected characters.
/// </summary>
/// <param name="text"> The text.</param>
/// <param name="allowedPattern"> The allowed pattern in Regex format, expect them wrapped in brackets</param>
/// <param name="replacement"> The replacement.</param>
/// <returns></returns>
/// // https://stackoverflow.com/questions/4460290/replace-chars-if-not-match.
//https://stackoverflow.com/questions/6154426/replace-remove-characters-that-do-not-match-the-regular-expression-net
//[^ ] at the start of a character class negates it - it matches characters not in the class.
//Replace/Remove characters that do not match the Regular Expression
static public string ReplaceNotExpectedCharacters( this string text, string allowedPattern,string replacement )
{
allowedPattern = allowedPattern.StripBrackets( "[", "]" );
//[^ ] at the start of a character class negates it - it matches characters not in the class.
var result = Regex .Replace(text, @"[^" + allowedPattern + "]", replacement);
return result;
}
static public string RemoveNonAlphanumericCharacters( this string text)
{
var result = text.ReplaceNotExpectedCharacters(NonAlphaNumericCharacters, "" );
return result;
}
public const string NonAlphaNumericCharacters = "[a-zA-Z0-9]";
There are a couple of functions from my StringHelper class http://geekswithblogs.net/mnf/archive/2006/07/13/84942.aspx , that are used here.
/// <summary>
/// ‘StripBrackets checks that starts from sStart and ends with sEnd (case sensitive).
/// ‘If yes, than removes sStart and sEnd.
/// ‘Otherwise returns full string unchanges
/// ‘See also MidBetween
/// </summary>
public static string StripBrackets( this string str, string sStart, string sEnd)
{
if (CheckBrackets(str, sStart, sEnd))
{
str = str.Substring(sStart.Length, (str.Length – sStart.Length) – sEnd.Length);
}
return str;
}
public static bool CheckBrackets( string str, string sStart, string sEnd)
{
bool flag1 = (str != null ) && (str.StartsWith(sStart) && str.EndsWith(sEnd));
return flag1;
}
精彩评论