I'm trying to write a Perl script that will parse the output of the stcmd.exe (the StarTeam command line client) hist command. I'm getting the history for every file in a view and the output looks something lik开发者_运维百科e this:
Folder: The View Name (working dir: C:\Projects\dir) History for: main.h Description: Some files Locked by: Status: Current ---------------------------- Revision: 1 View: The View Name Branch Revision: 1.0 Author: John Smith Date: 3/22/08 11:16:16 AM CST Main header ============================================================================= History for: main.c Description: Some files Locked by: Status: Current ---------------------------- Revision: 2 View: The View Name Branch Revision: 1.1 Author: Jane Doe Date: 3/22/08 1:55:55 PM CST Made an update. ---------------------------- Revision: 1 View: The View Name Branch Revision: 1.0 Author: John Smith Date: 3/22/08 11:16:16 AM CST Initial revision =============================================================================
Note that the revision summary can contain newlines and can be blank (in which case there's no line for it at all).
I want to get the filename and, for each revision, the author name (first and last), date, and change summary. I'd like to place this information in a data structure where I can sort revisions by date and combine revisions if the date, author, and summary match up. (I think I can figure this part out if someone can help me with the parsing.) I'm not great with regular expressions or Perl, but here's what I'm trying to work with right now:
# $hist contains the stcmd output in the format above
while($hist =~ /History for: (?<filename>.)/s)
{
# Record filename somewhere with $+{filename}
while($hist =~ /^Revision: (?<file_rev>\S+) View: (?<view_name>.+) Branch Revision: (?<branch_rev>\S+).\nAuthor: (?<author>.*) Date: (?<date>.*) \w+\r\n(?<summary>.*)/)
{
# Extract things with $+{author}, $+{date}, $+{summary}
}
}
This doesn't work, however. For all I know I may be approaching it completely wrong. Can someone point me in the right direction?
The key is to parse one chunk at a time and match all the relevant stuff at once. See qr
in perldoc perlop and $/ in perldoc perlvar.
Keeping in mind the fact that you also wanted to put the information in a data structure that would allow you to query and manipulate the information, here is one final revision. The code below uses the ability of SQLite to create in-memory databases. You might actually want to split the functionality into two scripts: One to parse and store the data and another one to do whatever manipulation you need. In fact, it might be possible to do all necessary manipulation in SQL.
#!/usr/bin/perl
use v5.010;
use strict; use warnings;
use DBI;
my $dbh = get_dbh();
my $header_pattern = qr{
History[ ]for: [ ](?<filename>[^\n]+) \n
Description: [ ](?<description>[^\n]+) \n
Locked[ ]by: [ ]?(?<lockedby>[^\n]*) \n
Status: [ ](?<status>.[^\n]+) \n
}x;
my $revision_pattern = qr{-+\n
Revision: [ ](?<revision>\d+) [ ]
View: [ ](?<view>.+) [ ]
Branch[ ]Revision: [ ](?<branch_revision>[^\n]+) \n
Author: [ ](?<author>.+) [ ]
Date: [ ](?<revdate>[^\n]+) \n
(?<summary>.*) \n
}x;
local $/ = '=' x 77 . "\n";
while ( my $entry = <>) {
if ( $entry =~ $header_pattern ) {
my %file = %+;
$dbh->do(sprintf(
q{INSERT INTO files (%s) VALUES (%s)},
join(',', keys %file),
join(',', ('?') x keys %file),
), {}, values %file );
while ( $entry =~ /$revision_pattern/g ) {
my %rev = %+;
$dbh->do(sprintf(
q{INSERT INTO revisions (%s) VALUES (%s)},
join(',', filename => keys %rev),
join(',', ('?') x (1 + keys %rev)),
), {}, $file{filename}, values %rev );
}
}
}
my $revs = $dbh->selectall_arrayref(
q{SELECT * FROM revisions JOIN files
ON files.filename = revisions.filename},
{ Slice => {} }
);
use Data::Dumper;
print Dumper $revs;
sub get_dbh {
my $dbh = DBI->connect(
'dbi:SQLite:dbname=:memory:', undef, undef,
{ RaiseError => 1, AutoCommit => 1 }
);
$dbh->do(q{PRAGMA foreign_keys = ON});
$dbh->do(q{CREATE TABLE files (
filename VARCHAR PRIMARY KEY,
description VARCHAR,
lockedby VARCHAR,
status VARCHAR
)});
$dbh->do(q{CREATE TABLE revisions (
filename VARCHAR,
revision VARCHAR,
view VARCHAR,
branch_revision VARCHAR,
author VARCHAR,
revdate VARCHAR,
summary VARCHAR,
CONSTRAINT pk_revisions PRIMARY KEY (filename, revision),
CONSTRAINT fk_revisions_files FOREIGN KEY (filename)
REFERENCES files(filename)
)});
return $dbh;
}
Output:
C:\Temp> y.pl test.txt $VAR1 = [ { 'status' => 'Current', 'revdate' => '3/22/08 11:16:16 AM CST', 'author' => 'John Smith', 'description' => 'Some files', 'revision' => '1', 'filename' => 'main.h', 'summary' => 'Main header', 'view' => 'The View Name', 'branch_revision' => '1.0', 'lockedby' => '' }, { 'status' => 'Current', 'revdate' => '3/22/08 1:55:55 PM CST', 'author' => 'Jane Doe', 'description' => 'Some files', 'revision' => '2', 'filename' => 'main.c', 'summary' => 'Made an update.', 'view' => 'The View Name', 'branch_revision' => '1.1', 'lockedby' => '' }, { 'status' => 'Current', 'revdate' => '3/22/08 11:16:16 AM CST', 'author' => 'John Smith', 'description' => 'Some files', 'revision' => '1', 'filename' => 'main.c', 'summary' => 'Initial revision', 'view' => 'The View Name', 'branch_revision' => '1.0', 'lockedby' => '' } ];
Here is one way to start. I prefer to split up your string into lines (\n
) and loop through those:
use strict;
use warnings;
my $hist = <<'EOF';
Folder: The View Name (working dir: C:\Projects\dir)
History for: main.h
Description: Some files
Locked by:
Status: Current
----------------------------
Revision: 1 View: The View Name Branch Revision: 1.0
Author: John Smith Date: 3/22/08 11:16:16 AM CST
Main header
=============================================================================
History for: main.c
Description: Some files
Locked by:
Status: Current
----------------------------
Revision: 2 View: The View Name Branch Revision: 1.1
Author: Jane Doe Date: 3/22/08 1:55:55 PM CST
Made an update.
----------------------------
Revision: 1 View: The View Name Branch Revision: 1.0
Author: John Smith Date: 3/22/08 11:16:16 AM CST
Initial revision
=============================================================================
EOF
my %data;
my $filename;
for (split /\n/, $hist) {
if (/History for: (.*)/) {
$filename = $1;
}
if (/^Revision: (.+?) View: (.+?) Branch Revision: (.*)/) {
$data{$filename}{rev} = $1;
$data{$filename}{view} = $2;
$data{$filename}{branch} = $3;
}
}
use Data::Dumper; print Dumper(\%data);
__END__
$VAR1 = {
'main.h' => {
'view' => 'The View Name',
'rev' => '1',
'branch' => '1.0'
},
'main.c' => {
'view' => 'The View Name',
'rev' => '1',
'branch' => '1.0'
}
};
You have some good answers already. Here's a different way to divide up the job:
use strict;
use warnings;
use Data::Dumper qw(Dumper);
# Read file a section at a time.
$/ = '=' x 77 . "\n";
my @data;
while (my $section = <>){
# Split each section into sub-sections, the
# first containing the file info and the rest
# containing info about each revision.
my @revs = split /-{20,}\n/, $section;
# Do whatever you want with @file_info and, below, @ref_info.
# The example here splits them apart into lines.
# Alternatively, you could run the sub-sections through
# regex parsing, as in Sinan's answer.
my @file_info = parse_lines(shift @revs);
push @data, { file_info => \@file_info };
for my $r (@revs){
my @rev_info = parse_lines($r);
push @{$data[-1]{revs}}, \@rev_info;
}
}
sub parse_lines {
# Parse each sub-section into lines.
my @lines = split /\n/, shift;
# Optionally, filtering out unwanted material.
@lines = grep { /\S/ and $_ !~ /={70,}/ } @lines;
# And perhaps splitting lines into their key-value components.
@lines = map { [split /:\s*/, $_, 2] } @lines;
return @lines;
}
print Dumper(\@data);
You need a state-based parser. With the __DATA__
section as before:
use v5.010;
use constant
{ READING_FOR_FILENAME => 0
, READING_FOR_AUTHOR => 1
, READING_FOR_DIVIDER => 2
};
use strict;
use warnings;
use English qw<%LAST_PAREN_MATCH>;
use Data::Dumper;
my $state = READING_FOR_FILENAME;
my %history_for;
my $file_name;
while ( <DATA> ) {
my $line = $_;
given ( $state ) {
when ( READING_FOR_FILENAME ) {
if ( $line =~ m/^History for: (?<file_name>\S+)/ ) {
$file_name = $LAST_PAREN_MATCH{file_name};
$state = READING_FOR_DIVIDER;
}
}
when ( READING_FOR_DIVIDER ) {
if ( $line =~ m/^-+\s*$/ ) {
$state = READING_FOR_AUTHOR;
}
elsif ( $line =~ m/^=+\s*$/ ) {
$state = READING_FOR_FILENAME;
}
}
when ( READING_FOR_AUTHOR ) {
if ( $line =~ m/^Author: (?<author>[^:]+?) Date: (?<time>.*)/ ) {
push @{ $history_for{$file_name} }
, { name => $LAST_PAREN_MATCH{author}
, time => $LAST_PAREN_MATCH{time}
};
$state = READING_FOR_DIVIDER;
}
}
}
}
print Dumper( \%history_for );
精彩评论