There are some blocks in my text file. I assumed to structure my text by the block below
How can I read the block by the keywords.(keyword1, keyword2, keyword3, keyword4). I got two questions. 1. Is there any method to get out the next line of each keyword efficiently? 2. I don't know how to jump the internal blank line between keyword3 and keyword4. The key point is the block defined end with blank.**block start**
Keyword1
Single Line # I need work on the line
Keyword2
Single or Multiple lines # I need work on the lines
Keyword3
(May be there is single or multiple Blank lines)
Single or Multiple lines # I need work on the lines
(May be there is single or multiple Blank lines)
Keyword4
Single or Multiple lines # I need work on the lines
Single or multipl开发者_StackOverflow社区e Blank line
**block end**
If I understand your data, blank lines are not a reliable indicator, because they can appear before a keyword's text begins, after the text, or not at all. If that's the case, I don't think it will help to read the text in "paragraph mode" (by setting $/
to an empty string). Similarly, the blank lines do not help -- at least not in a simple way -- to identify the start and end of the keyword sections or the "blocks".
You are going to have to parse the text in a more fine-grained way, but you haven't given us enough information to provide a detailed answer. Here's an example that simply stores the non-blank lines by keyword:
use strict;
use warnings;
my (%data, $keyword);
while (my $line = <DATA>){
next unless $line =~ /\S/;
chomp $line;
if ($line =~ /^Keyword/){
$keyword = $line;
}
else {
push @{$data{$keyword}}, $line;
}
}
__DATA__
Keyword1
data1 a
Keyword2
data2 a
data2 b
data2 c
Keyword3
data3 a
data3 b
Keyword4
data4 a
data4 b
Do you know about setting $/
to the empty string for “paragraphs mode”?
Every call to <>
or readline
now returns a multiline record up to one or more blank lines, and chomp
removes them all from the end.
Can't you just do a multiline match and use the keywords as anchors like this:
$data =~ /(Keyword1.*?Keyword2.*?Keyword3.*?Keyword4.*?)\n$/sm;
my $block = $1;
Actually, you could do this as well and get the data from each block:
my @keys = $data =~ /Keyword1(.*?)Keyword2(.*?)Keyword3(.*?)Keyword4(.*?)\n$/sm;
and then you could just strip out blank lines in each group.
精彩评论