开发者

Find all but the first occurrence of a character with REGEX

开发者 https://www.devze.com 2023-01-26 21:55 出处:网络
I\'m building a .Net application and I need to strip any non-decimal character from a string (excluding the first \'.\').Essentially I\'m cleaning user input to force a real number result.

I'm building a .Net application and I need to strip any non-decimal character from a string (excluding the first '.'). Essentially I'm cleaning user input to force a real number result.

So far I've been using online RegEx tools to try and achieve this in a single pass, but I'm not getting very far.

开发者_如何转开发I wish to accomplish this:

asd123.asd123.123.123 = 123.123123123

Unfortunately I've only managed to get to the stage where

asd123.asd123.123.123 = 123.123.123.123

by using this code.

System.Text.RegularExpressions.Regex.Replace(str, "[^\.|\d]*", "")

But I am stuck trying to remove all but the first decimal-point.

Can this be done in a single pass?

Is there a better-way™?


This can be done in a single regex, at least in .NET which supports infinite repetition inside lookbehind assertions:

resultString = Regex.Replace(subjectString, @"(?<!^[^.]*)\.|[^\d.]", "");

Explanation:

(?<!^[^.]*) # Either match (as long as there is at least one dot before it)
\.          # a dot
|           # or
[^\d.]      # any characters except digits or dots.

(?<!^[^.]*) means: Assert that it's impossible to match a string that starts at the beginning of the input string and consists solely of characters other than dots. This condition is true for all dots following the first one.


I think it'll be done better without regular expressions.

string str = "asd123.asd123.123.123";
StringBuilder sb = new StringBuilder();
bool dotFound = false;
foreach (var character in str)
{
    if (Char.IsDigit(character))
        sb.Append(character);
    else if (character == '.')
        if (!dotFound)
        {
            dotFound = true;
            sb.Append(character);
        }
}
Console.WriteLine(sb.ToString());


Firstly, the regex you are currently using will leave any | characters untouched. You only need [^.\d]* since . has no special meaning in []

After this replace, you could try something like this:

Replace(str, "([\d]+\.[\d]+)[^\d].*", "\1");

But you'd only need this if there is a . at all in the number.

Hope this helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消