For example, let's say I wanted to match an uppercase letter A-Z
, but not F-H
. Uppercase letters would be [A-Z]
, and not F-H
would be [^F-H]
if I am n开发者_如何学Pythonot mistaken. Intuitively, I want it to be [A-Z^F-H]
, but that does not seem to be working. I know it could be done [A-EI-Z]
, but I am looking for less of a workaround solution. EDIT: looking for a more general solution.
There is nothing that is "less of a workaround". The character range syntax like [abcdef]
is just for matching any of the enumerated characters. It can be inverted like [^abcdef]
. Then [a-f]
is provided as a syntactic shorthand for explicitly writing out all the characters. If you want to match multiple ranges with gaps between them, you have to specify the multiple ranges.
If flex supports postivie/negative lookahead/lookbehind you could try messing with those features. I would be willing to bet it would come out way more complex to read and significantly less efficient than just writing [A-EI-Z]
.
Edit: After reading your comment that the 'holes' you want in your range may not be known until runtime, you'd have to do it with lookahead/lookbehind. Syntax for that varies between regex engines, and I'm not sure about flex, or whether it can even do that. Essentially you'll want a regex that matches [A-Z]
, with a negative lookbehind assertion for [F-H]
. Or matches a positive lookahead assertion for [^F-H]
followed by [A-Z]
.
The key thing about lookahead/lookbehind is that they don't actually consume any of the input, they just cause matching to fail if the assertion isn't met at the current match position. They usually wind up less efficient than doing things directly (if you can), and can be tricky to get right, and different regex engines seem to have different restrictions about when you can and can't use them.
I think that the "^" must be the first character in [] if you mean not to match character in the square brackets.
精彩评论