开发者

Any suggestions for improving (optimizing) existing string substitution in Perl code?

开发者 https://www.devze.com 2023-02-25 03:34 出处:网络
Perl 5.8 Improvements for fairly straightforward 开发者_高级运维string substitutions, in an existing Perl script.

Perl 5.8

Improvements for fairly straightforward 开发者_高级运维string substitutions, in an existing Perl script.

The intent of the code is clear, and the code is working.

For a given string, replace every occurrence of a TAB, LF or CR character with a single space, and replace every occurrence of a double quote with two double quotes. Here's a snippet from the existing code:


# replace all tab, newline and return characters with single space
$val01  =~s/[\t\n\r]/ /g;
$val02  =~s/[\t\n\r]/ /g;
$val03  =~s/[\t\n\r]/ /g;

# escape all double quote characters by replacing with two double quotes
$val01  =~s/"/""/g;
$val02  =~s/"/""/g;
$val03  =~s/"/""/g;

Question:Is there a better way to perform these string manipulations?

By "better way", I mean to perform them more efficiently, avoiding use of regular expressions (possibly using tr/// to replace the tab, newline and lf characters), or possibly using using the (qr//) to avoid recompilation.

NOTE: I've considered moving the string manipulation operations to a subroutine, to reduce the repetition of the regular expressions.

NOTE: This code works, it isn't really broken. I just want to know if there is a more appropriate coding convention.

NOTE: These operations are performed in a loop, a large number (>10000) of iterations.

NOTE: This script currently executes under perl v5.8.8. (The script has a require 5.6.0, but this can be changed to require 5.8.8. (Installing a later version of Perl is not currently an option on the production server.)


    > perl -v
    This is perl, v5.8.8 built for sun4-solaris-thread-multi
    (with 33 registered patches, see perl -V for more detail)


Your existing solution looks fine to me.

As for avoiding recompilation, you don't need to worry about that. Perl's regular expressions are compiled only once as it is, unless they contain interpolated expressions, which yours don't.

For the sake of completeness, I should mention that even if interpolated expressions are present, you can tell Perl to compile the regex once only by supplying the /o flag.

$var =~ s/foo/bar/;    # compiles once
$var =~ s/$foo/bar/;   # compiles each time
$var =~ s/$foo/bar/o;  # compiles once, using the value $foo has
                       # the first time the expression is evaluated


TMTOWTDI

You could use the tr or the index or the substr or the split functions as alternatives. But you must make measurements to identify the best method for your particular system.


You might be prematurely optimizing. Have you tried using a profiler, such as Devel::NYTProf, to see where your program spends the most of its time?


My guess would be that tr/// would be (slightly) quicker than s/// in your first regex. How much faster would, of course, be determined by factors that I don't know about your program and your environment. Profiling and benchmarking will answer that question.

But if you're interested in any kind of improvement to your code, can I suggest a maintainability fix? You run the same substitution (or set of substitutions) on three variables. This means that when you change that substitution, you need to change it three times - and doing the same thing three times is always dangerous :)

You might consider refactoring the code to look something like this:

foreach ($val01, $val02, $val03) {
    s/[\t\n\r]/ /g;
    s/"/""/g;
}

Also, it would probably be a good idea to have those values in an array rather than three such similarly named variables.

foreach (@vals) {
    s/[\t\n\r]/ /g;
    s/"/""/g;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消