Text manipulation in Ruby_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-21 07:28 出处：网络

I\'m trying to write a word counter for LyX files. Life is almost very simple as most lines that need to be ignored begin with a \\ (I\'m prepared to make the assumption that no textual lines begin w

相关专题：regex ruby

I'm trying to write a word counter for LyX files.

Life is almost very simple as most lines that need to be ignored begin with a \ (I'm prepared to make the assumption that no textual lines begin with backslashes) - however there are some lines that look like real text that aren't, but they are enclosed by \begin_inset and \end_inset:

I'm gen开发者_StackOverflow中文版uine text.

\begin_inset something
I'm not real text
Perhaps there will be more than one line! Or none at all! Who knows.
\end_inset

/begin_layout
I also need to be counted, and thus not removed
/end_layout

Is there a quick way in ruby to strip the (smallest amount of) text between two markers? I'm imagining Regular Expressions are the way forward, but I can't figure out what they'd have to be.

Thanks in advance

Is there a quick way in ruby to strip the (smallest amount of) text between two markers?

str = "lala BEGIN_MARKER \nlu\nlu\n END_MARKER foo BEGIN_MARKER bar END_MARKER baz"
str.gsub(/BEGIN_MARKER.*?END_MARKER/m, "")
#=> "lala  foo  baz"

gsub could be expensive for longer files (if you're reading in the whole file as string)

so if you have to chunk it anyway, you might want to use a stateful parser

in_block = false
File.open(fname).each_line do |line| 
 if in_block
    in_block = false if line =~ /END_MARKER/
    next
  else
    in_block = true if line =~ /BEGIN_MARKER/
    next
  end
  count_words(line)
end

You should look at str.scan(). Assuming your text is in the variable s, something like this should work:

s_strip_inset = s.sub!(/\\begin_inset.*?\\end_inset/, "")
word_count = s_strip_inset.scan(/(\w|-)+/).size

Text manipulation in Ruby

精彩评论

关注公众号

热门标签

图文推荐

Text manipulation in Ruby

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：