开发者

regex for capturing digits and digit ranges

开发者 https://www.devze.com 2022-12-14 03:02 出处:网络
i have开发者_如何学运维 the following string Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)

i have开发者_如何学运维 the following string

Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)

i want to capture

212,323.222
2-2.24
0.5

i.e. i want the above three results from the string,

can any one help me with this regex


I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH".

So, here is another regex in C#

@"[0-9]+([,.\u2013-][0-9]+)*"

Test

MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", @"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
    Console.WriteLine(m.Groups[0]);
}

Here is the results, my console does not support printing unicode char 2013, so its "?" but its properly matched.

2121,323.222
2?2.4
0.5


Okay I didn't notice the C# tag until now. I will leave the answer but I know that's not what you expected, see if you can do something with it. Perhaps the title should have mentioned the programming language?


Sure:

Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)

Find your substrings in \1, \2 and \3. If for Emacs, swap all parentheses and escaped parentheses.


How about something like this:

^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$

A little more general, I think. I'm a little concerned about .* being greedy.


Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)

a generalized extractor:

/\D+?([\d\,\.\-]+)/g

explanation:

/           # start pattern
 \D+        # 1 or more non-digits
  (         # capture group 1          
   [\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
  )         # end capture group 1
/g          # trailing regex g modifier (make regex continue after last match)

sorry I don't know c# well enough for a full writeup, but the pattern should plug right in.

see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples.


I came out with something like this atrocity:

-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?

Out of witch -?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))? is repeated twice, with in the middle (note that this is a long hyphen).
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world - will capture 23,45.2-7.


It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". Here is a regex that should work:

\d+(?:[,.-]\d+)*

From C# 3, you can use it like this:

var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = @"\d+(?:[,.-]\d+)*";

var matches = Regex.Matches(input, pattern);

foreach ( var match in matches )
  Console.WriteLine(match.Value);


Hmm, this is a tricky question, especially because the input string contains unicode character – (EN DASH) instead of - (HYPHEN-MINUS). Therefore the correct regex to match the numbers in the original string would be:

\d+(?:[\u2013,.]\d+)*

If you want a more generic approach would be:

\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*

which matches dash punctuation, connecter punctuation and other punctuation. See here for more information about those.

An implementation in C# would look like this:

string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
    Regex rx = new Regex(@"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
    Match match = rx.Match(input);
    while (match.Success) {
        // matched text: match.Value
        // match start: match.Index
        // match length: match.Length
        match = match.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}


Let's try this one :

(?=\d)([0-9,.-]+)(?<=\d)

It captures all expressions containing only :

  • "[0-9,.-]" characters,
  • must start with a digit "(?=\d)",
  • must finish with a digit "(?<=\d)"

It works with a single digit expression and does not include beginning or trailing [.,-].

Hope this helps.


I got the solution to my problem.

The following is the Regex that gave my desired result:

(([0-9]+)([–.,-]*))+
0

精彩评论

暂无评论...
验证码 换一张
取 消