开发者

Regex - pattern capture everything except for pattern [.net]

开发者 https://www.devze.com 2023-02-13 04:45 出处:网络
I would like to capture anything up to, but not including a particular patter. My actual problem has to do with parsing out information 开发者_运维技巧from html, but I am distilling the problem down t

I would like to capture anything up to, but not including a particular patter. My actual problem has to do with parsing out information 开发者_运维技巧from html, but I am distilling the problem down to an example to, hopefully, clarify my question.

Source

xaxbxcabcabc

Desired Match

xaxbxc

If I use a lookahead the expression will capture the first occurrence

.*(?=abc) => xaxbxcabc

I would like something along the lines of a negated character class, just for a negated pattern.

.*[^abc] //where abc as a pattern instead of a list giving anything but a, b or c

I am using http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx for testing


A non-greedy (lazy) quantifier *? could be useful here, e.g.

^(?<captured>.*?)abc.*$

Edit: Just to be clear, the explicit capture is (of course) not needed, the really important part is just

(.*?)abc


If you anchor the regex you'll solve the problem (+ use of lazy quantifier):

"^.*?(?=abc)"


Why not use a replace:

string result = new Regex("abc.*$").Replace ( input, "" );

This will remove everything from the first matching phrase onwards, leaving you with all of the content up until that point.

0

精彩评论

暂无评论...
验证码 换一张
取 消