开发者

How to emulate PHPs preg_split in ruby to capture offsets and delimiters?

开发者 https://www.devze.com 2022-12-28 14:55 出处:网络
I wonder if there is a way to get offsets and delimiters while I am splitting a string in ruby analagous to PHP preg_split:

I wonder if there is a way to get offsets and delimiters while I am splitting a string in ruby analagous to PHP preg_split:

preg_split("/( |&nbsp;|<|>|\t|\n|\r|;|\.)/i", $html_st开发者_开发知识库ring, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_OFFSET_CAPTURE);

I imagine I can achieve it by traversing string by characters or using something heavy as treetop, but I would like to use something more convenient.


You're looking for MatchData#offset or MatchData#begin, which you can access on Regexp.last_match or $~:

html_string.scan(/( |&nbsp;|<|>|\t|\n|\r|;|\.)/i) do |match|
  # Returns begin and end position for this match, e.g. [5, 10]
  Regexp.last_match.offset(0)
end


You can fetch offsets from $~ in Ruby, for example:

"foobarbaz".scan(/[oa]+/) { p [$~.begin(0), $~.end(0), $~.to_s] }

prints

[1, 3, "oo"]
[4, 5, "a"]
[7, 8, "a"]

Based on this you can write a loop which generates the same offsets as your PHP code did.


Thanks for both solutions, very helpful to know such approach. If I use scan I have to add logic to get things between matches. The same effect can be achieved with similar amount of lines using String#index. Too bad String#split does not take a block

def html_split(str)
  DELIMITERS = /(&nbsp;|[\s<>;.])/i
  data = []
  offset = 0
  i = str.index(DELIMITERS)
  while i do
    if i > 0
      value = str[0...i]
      data << [value, offset] 
      offset += i
    end
    delimiter = str[i..i] == '&' ? str[i..i+6] : str[i..i]
    data << [delimiter, offset]
    offset += delimiter.size
    str = str[(i + delimiter.size)..-1]
    i = str.index(DELIMITERS)
  end
  data
end
0

精彩评论

暂无评论...
验证码 换一张
取 消