开发者

Why do I get extra line breaks in the web page I download with Perl?

开发者 https://www.devze.com 2023-01-19 20:17 出处:网络
I\'m writing a simple Perl script (on Windows) to download the response of a get request to a url to a file. Pretty straight-forward. Except when it writes to the output file, I get extra line b开发者

I'm writing a simple Perl script (on Windows) to download the response of a get request to a url to a file. Pretty straight-forward. Except when it writes to the output file, I get extra line b开发者_如何学JAVAreaks. So like instead of:

<head>
  <title>title</title>
  <link .../>
</head>

I get

<head>

  <title>title</title>

  <link .../>

</head>

Here's the Perl script:

use LWP::Simple;

my $url = $ARGV[0];
my $content = get($url);

open(outputFile, '+>', $ARGV[1]);

print outputFile $content;

close(outputFile);

I suppose I could just get wget for Windows, but now this is bothering me. How do I get rid of those extra line breaks?!


  1. There's no sane reason for the >+ mode in your example code. Just saying.
  2. LWP::Simple has a getstore method. If you're using LWP::Simple, why not use it?
  3. By default, open is going to push the :crlf I/O layer when running on win32, which turns \n into \r\n. But the data you're writing already has \r\n, so you're ending up with too many newlines. If you want data to be written verbatim, you should use binmode, or open the handle with :raw to begin with. LWP already does this correctly.


I'm guessing that $content already includes CRLF newlines and Perl's IO layer is doing LF -> CRLF conversion. (Internally, "\n" is a single character in Perl, usually LF). I'd add

binmode(outputFile);

after the open to disable that conversion and write the results of $content directly.


chomp($content) would be my guess. as it looks like there is natively already set of \n's in it.

EDIT: Sorry I just realized that chomp won't work, unless you split the file up into lines, then chomp each line, as chomp will only chomp the end of the input string, my solution wouldn't help in this case, however, you could split it on \n\n and then join? I do like the solution to use a regex on the string returned in an answer below. however for me the minor modification of: including some additional changes, so it still separates lines but it will check for either 2+ \n's or 2+ \r's or any combination of the two. then returning a \n in it's place, that way it's only going to have one new line per line (hopefully)

$content =~ s/[\n\r]+/\n/g;

EDITED Above again, accidentally put a ! in there for some reason....not sure why

0

精彩评论

暂无评论...
验证码 换一张
取 消