开发者

Line-oriented streaming in Ruby (like grep)

开发者 https://www.devze.com 2023-03-26 03:34 出处:网络
By default Ruby opens $stdin and $stdout in buffered mode. This means you can\'t use Ruby to perform a grep-like operation filtering text. Is there any way to force Ruby to use line-oriented mode? I开

By default Ruby opens $stdin and $stdout in buffered mode. This means you can't use Ruby to perform a grep-like operation filtering text. Is there any way to force Ruby to use line-oriented mode? I开发者_StackOverflow've seen various solutions including popen3 (which does buffered-mode only) and pty (which doesn't separately handle $stdout and $stderr, which I require).

How do I do this? Python seems to have the same lack.


It looks like your best bet is to use STDOUT.syswrite and STDOUT.sysread - the following seemed to have reasonably good performance, despite being ugly code:

STDIN.sync = true
STDOUT.syswrite "Looking for #{ARGV[0]}\n"

def next_line
  mybuff = @overflow || ""
  until mybuff[/\n/]
    mybuff += STDIN.sysread(8)
  end
  overflow = mybuff.split("\n")
  out, *others = overflow
  @overflow = others.join("\n")
  out
rescue EOFError => e
  false  # NB: There's a bug here, see below
end

line = next_line
while line
  STDOUT.syswrite "#{line}\n" if line =~ /#{ARGV[0]}/i
  line = next_line
end

Note: Not sure you need #sync with #sysread, but if so you should probably sync STDOUT too. Also, it reads 8 bytes at a time into mybuff - you should experiment with this value, it's highly inefficient / CPU heavy. Lastly, this code is hacky and needs a refactor, but it works - tested it using ls -l ~/* | ruby rgrep.rb doc (where 'doc' is the search term)


Second note: Apparently, I was so busy trying to get it to perform well, I failed to get it to perform correctly! As Dmitry Shevkoplyas has noted, if there is text in @overflow when EOFError is raised, that text will be lost. I believe if you replace the catch with the following, it should fix the problem:

rescue EOFError => e
  return false unless @overflow && @overflow.length > 0
  output = @overflow
  @overflow = ""
  output
end

(if you found that helpful, please upvote Dmitry's answer!)


You can always turn on autoflush on any stream you want:

STDOUT.sync = true

This will have the effect of committing any writes immediately.

Most languages have this feature, but they always call it something a little different.


You can call $stdout.flush after you've printed your line, and call $stdin.readline to fetch one line.


The accepted answer by user208769 is good, but has one flaw: under certain conditions you will loose last line. I'll show how to reproduce it and how to fix it below:

To reproduce the "last line lost" bug:

mkdir deleteme
touch deleteme/1 deleteme/2 deleteme/3
ls deleteme/ | ./rgrep.rb ''
Looking for
1
2

as you can see the "3" file is missing from the rgrep output. Surprisingly for the different filename length it would work differently though! Look:

rm -fr deleteme/
mkdir deleteme
touch deleteme/11 deleteme/22 deleteme/33
ls deleteme/ | ./rgrep.rb ''
Looking for
11
22
33

Now the third file is present! What a bug! Isn't she a beauty!!?
One can only imagine how much damage this random behaviour can cause.

To fix the bug we'd modify rescue portion slightly:

#!/usr/bin/env ruby
STDIN.sync = true
STDOUT.syswrite "Looking for #{ARGV[0]}\n"

def next_line
  mybuff = @overflow || ""
  until mybuff[/\n/]
    mybuff += STDIN.sysread(8)
  end
  overflow = mybuff.split("\n")
  out, *others = overflow
  @overflow = others.join("\n")
  out
rescue EOFError => e
  if @overflow.to_s.size > 0
    leftover_line = @overflow
    @overflow = ''
    return leftover_line
  else
    false
  end
end

line = next_line
while line
  STDOUT.syswrite "#{line}\n" if line =~ /#{ARGV[0]}/i
  line = next_line
end

I'll leave the "why" portion out of this post as an exercise for curious ones as otherwise it wont be digested properly (and this post is already way too long for my 1st post ever;) heh..

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号