Regex - match up to next match_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-25 14:48 出处：网络

I would like to iterate matches over a text, the blocks I want to match start with a number then a tab character.

My beginning match is ^\d+\t, but is there a way to indicate that I want all text including this match up until the next match?

Input data:

1       111.111.111.111
111.111.111.111
                    Host IP     111.111.111.111
111.111.111.111
111.111.111.111         Host IP     TCP             app     11111, 11111, 11111, 11111      Allow
2       111.111.111.111
111.111.111.111
111.111.111.111         Host IP     111.111.111.111
111.111.111.111         Host IP     TCP             app     11111, 11111, 11111, 11111      Allow
3       111.111.111.111
111.111.111.111       开发者_如何学Python  Host IP     111.111.111.111
111.111.111.111
111.111.111.111
111.111.111.111         Host IP     TCP             app     11111, 11111, 11111, 11111      Allow
4       111.111.111.111
111.111.111.111
111.111.111.111
111.111.111.111         Host IP     111.111.111.111
111.111.111.111         Host IP     TCP             app     11111, 11111, 11111, 11111      Allow

I'm using Perl.

The following regex should do what you want:

^\d+\t(?:[^\d]+|[\d]+(?!\t))*

This will match some number of digits followed by a tab, and then any number of non-digits or digits that are not followed by a tab.

my @matches = $data =~ /^\d+\t(?:[^\d]+|[\d]+(?!\t))*/mg;

edit: Okay this one should work!

Probably, this?

/^\d+\t.*?(?:\z|^\d+\t)/ms

while (/
    \G
    ( \d+\t )
    ( (?: (?! \d+\t ) . )* )
/xg) {
    print("match:  $1\n");
    print("buffer: $2\n");
}

Sample input and expected results would help, as it is I'm not really sure I know what your looking for.

If your just matching on one pattern you might be able to split the string:

my $string = "text\n1\ttest\n2\tend\n";
my @matches = split /^(\d+)\t/m, $string;
shift @matches; # remove the text before the first number
print "[$_]\n" for @matches;

__END__
Output:
[1]
[test
]
[2]
[end
]

If your matching multiple patterns Perl has special variables that can let you find where a match starts and finishes. Which can be used to extract what was between two matches.

use English qw(-no_match_vars);

my $string = "1\ttestEND\n2\ttextEND\n";
if ($string =~ /^\d+\t/) {
    my $last_match_end = $LAST_MATCH_END[0];

    if ($string =~ /END/cg) {
        my $last_match_start = $LAST_MATCH_START[0];
        my $len = $last_match_start - $last_match_end;
        print substr($string, $last_match_end, $len) . "\n"
    }
}
__END__
Output:
test