In Perl, can I limit the length of a line as I read it in from a file (like fgets)?_问答_开发者

I'm trying to write a piece of code that reads a file line by line and stores each line, up to a certain amount of input data. I want to guard against the end-user being evil and putting something like a gig of data on one line in addition to guarding against sucking in an abnormally large file. Doing $str = <FILE> will still read in a whole line, and that could be very long and blow up my memory.

fgets lets me do this by letting me specify a number of bytes to read during each call and essentially letting me split one long line into my max length. Is there a similar way to do this in perl? I saw something about sv_gets but am not sure how to use it (though I only did a cursory Google search).

The goal of this exercise is to avoid having to do additional parsing / buffering after reading data. fgets stops after N bytes or when a newline is reached.

EDIT I think I confused some. I want to read X lines, each with max length Y. I don't want to read more than Z bytes total, and I would prefer not to read all Z bytes at once. I guess I could just do that and split the line开发者_高级运维s, but wondering if there's some other way. If that's the best way, then using the read function and doing manual parse is my easiest bet.

Thanks.

Perl has no built-in fgets, but File::GetLineMaxLength implements it.

If you want to do it yourself, its pretty straightforward with getc.

sub fgets {
    my($fh, $limit) = @_;

    my($char, $str);
    for(1..$limit) {
        my $char = getc $fh;
        last unless defined $char;
        $str .= $char;
        last if $char eq "\n";
    }

    return $str;
}

Concatenating each character to $str is efficient as Perl will realloc opportunistically. If a Perl string has 16 bytes and you concatenate another character, Perl will reallocate it to 32 bytes (32 goes to 64, 64 to 128...) and remember the length. The next 15 concatenations require no memory reallocations or calls to strlen.

sub heres_what_id_do($$) {
    my ($fh, $len) = @_;
    my $buf = '';

    for (my $i = 0; $i < $len; ++$i) {
        my $ch = getc $fh;
        last if !defined $ch || $ch eq "\n";
        $buf .= $ch;
    }

    return $buf;
}

Not very "Perlish" but who cares? :) The OS (and possibly Perl itself) will do all the necessary buffering underneath.

As an exercise, I've implemented a wrapper around C's fgets() function. It falls back to a Perl implementation for complicated filehandles defined as "anything without a fileno" to cover tied handles and whatnot. File::fgets is on its way to CPAN now, you can pull a copy from the repository.

Some basic benchmarking shows its over 10x faster than any of the implementations here. However, I cannot say its bug free or doesn't leak memory, my XS skills are not that great, but its better tested than anything here.

Use the read function (perlfunc read)

You can implement fgets() yourself trivially. Here's one that works like C:

sub fgets{my($n,$c)=($_[1],''); ($_[0])=('');
  for(;defined($c)&&$c ne "\n"&&$n>0;$n--){$_[0].=($c=getc($_[2]));}
  defined($c)&&$_[0]; }

Here's one with PHP's semantics:

sub fgets{my($n,$c,$x)=($_[1],'','');
  for(;defined($c)&&$c ne "\n"&&$n>0;$n--){$x.=($c=getc($_[0]));}
  ($x ne '')&&$x; }

If you're trying to implement resource limits (i.e. trying to prevent an untrusted client from eating up all your memory) you really should not be doing it this way. Use ulimit to set up those resource limits before calling your script. A good sysadmin will set up resource limits anyway, but they like it when programmers make startup scripts that set reasonable limits.

If you're trying to limit input before you proxy this data to another site (say, limiting SMTP input lines because you know remote sites might not support more than 511 characters), then just check the length of the line after <INPUT> with length().