开发者

How to work around the invalid byte sequence in UTF-8 ArgumentError?

开发者 https://www.devze.com 2023-03-09 15:33 出处:网络
I am trying to run the following code where I use nokogiri to parse an xml file. I want to eliminate new line characters from text

I am trying to run the following code where I use nokogiri to parse an xml file. I want to eliminate new line characters from text contained between tags. The code I have here, used to work, but for some reason, now it doesn't. Possibly because I upgraded to ruby-1.9.1.

titles 开发者_如何学JAVA= node.search('b')
titles.each do |e|
  unless e.parent.name == "h4"
    if e.children.children.first.nil? == false
      puts e.children.children.first.text.gsub("\n","")
    end
  end
end

When I run the code I get this error:

HI.  You're using libxml2 version 2.6.16 which is over 4 years old and has
plenty of bugs.  We suggest that for maximum HTML/XML parsing pleasure, you
upgrade your version of libxml2 and re-install nokogiri.  If you like using
libxml2 version 2.6.16, but don't like this warning, please define the constant
I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 before requring nokogiri.

test.rb:35:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)


You could try installing 1.9.2 via RVM.

curl -L https://get.rvm.io | bash
rvm install 1.9.2

If you want ruby to default to your rvm 1.9.2 install, then

rvm use 1.9.2 --default

NOTE: The above are equivalent to:

curl -L https://get.rvm.io | bash -s -- --ruby=1.9.2
0

精彩评论

暂无评论...
验证码 换一张
取 消