开发者

How to replace every occurrence of a pattern in a string using Ruby?

开发者 https://www.devze.com 2023-01-30 22:24 出处:网络
I have an XML file which is too big. To make it smaller, I want to replace all tags and attribute names with shorter versions of the same thing.

I have an XML file which is too big. To make it smaller, I want to replace all tags and attribute names with shorter versions of the same thing.

So, I implemented this:

string.gsub!(/<(\w+) /) do |match|
    case match
    when 'Image' then 'Img'
   开发者_Python百科 when 'Text'  then 'Txt'
    end
end

puts string

which deletes all opening tags but does not do much else.

What am I doing wrong here?


Here's another way:

class String
  def minimize_tags!
    {"image" => "img", "text" => "txt"}.each do |from,to|
      gsub!(/<#{from}\b/i,"<#{to}")
      gsub!(/<\/#{from}>/i,"<\/#{to}>")
    end
    self
  end
end

This will probably be a little easier to maintain, since the replacement patterns are all in one place. And on strings of any significant size, it may be a lot faster than Kevin's way. I did a quick speed test of these two methods using the HTML source of this stackoverflow page itself as the test string, and my way was about 6x faster...


Here's the beauty of using a parser such as Nokogiri:

This lets you manipulate selected tags (nodes) and their attributes:

require 'nokogiri'

xml = <<EOT
<xml>
  <Image ImagePath="path/to/image">image comment</Image>
  <Text TextFont="courier" TextSize="9">this is the text</Text>
</xml>
EOT

doc = Nokogiri::XML(xml)
doc.search('Image').each do |n| 
  n.name = 'img' 
  n.attributes['ImagePath'].name = 'path'
end
doc.search('Text').each do |n| 
  n.name = 'txt'
  n.attributes['TextFont'].name = 'font'
  n.attributes['TextSize'].name = 'size'
end
print doc.to_xml
# >> <?xml version="1.0"?>
# >> <xml>
# >>   <img path="path/to/image">image comment</img>
# >>   <txt font="courier" size="9">this is the text</txt>
# >> </xml>

If you need to iterate through every node, maybe to do a universal transformation on the tag-name, you can use doc.search('*').each. That would be slower than searching for individual tags, but might result in less code if you need to change every tag.

The nice thing about using a parser is it'll work even if the layout of the XML changes since it doesn't care about whitespace, and will work even if attribute order changes, making your code more robust.


Try this:

string.gsub!(/(<\/?)(\w+)/) do |match|
  tag_mark = $1
  case $2
  when /^image$/i
    "#{tag_mark}Img"
  when /^text$/i
    "#{tag_mark}Txt"
  else
    match
  end
end  
0

精彩评论

暂无评论...
验证码 换一张
取 消