My homework is to write a regular expression representing the language of numerical literals from C programming language. I can use l for letter, d for digit, a for +, m for -, and p for point. Assume that there are no limits on the number of consecutive digits in any part of the expression.
Some of the examples of valid numerical literals were 13. , .328, 41.16, +45.8开发者_运维知识库0, -2.e+7, -.4E-7, 01E-06, +0
I came up with: (d+p+a+m)(d+p+E+e+a+m)*
update2: (l+d+p+a+m)(d+p+((E+e)(a+m+d)d*) )* im not sure how to prevent something like 1.0.0.0eee-e1.Your regular expression does not support the various suffixes (l
, u
, f
, etc.), nor does it support hexadecimal or octal constants.
The leading signs (+
or -
in front of the number) are not lexically part of the constant; they are the unary +
and -
operators. Effectively, all integer and floating constants are positive.
If you need to fully support C99 floating constants, you need to support hexadecimal exponents (p
instead of e
).
Your regular expression also accepts many invalid sequences of characters, like 1.0.0.0eee-e1
.
A single regular expression to match all C integer and floating literals would be quite long.
Untested, but this should be along the right lines for decimal at least. (Also, it accepts the string ".", or I think it does anyway; to fix that would eliminate the last of the common code between integer and FP, the leading [0-9]*
.)
[0-9]*([0-9]([uU](ll?+LL?)+(ll?+LL?)?[uU]?)+(\.[0-9]*)?([eE][+-]?[0-9]+)[fFlL])
This Regex will match all your need:
[+-]?(?P<Dot1>\.)?\d+(?(Dot1)(?#if_dot_exist_in_the_beginning__do_nothing)|(?#if_dot_not_exist_yet__we_accept_optional_dot_now)(?P<Dot2>\.)?)\d*(?P<Exp>[Ee]?)(?(Exp)[+-]?\d*)
精彩评论