开发者

nutch crawler is crawling ' as â€

开发者 https://www.devze.com 2023-02-07 18:37 出处:网络
nutch crawler is crawling let\'s as Let’s y??? is there is 开发者_StackOverflowany setting to change the this charset..’ is the UTF-8 encoding of the single closing quote (not the apostrophe

nutch crawler is crawling let's as Let’s y??? is there is 开发者_StackOverflowany setting to change the this charset..


’ is the UTF-8 encoding of the single closing quote (not the apostrophe), and you're interpreting it as Windows-1252. You need to use the right encoding (UTF-8). This link may help.


I haven't used Nutch myself, but this page looks like it's relevant:

To enable passing of UTF-8 characters, edit $TOMCAT/conf/server.xml. Locate the <Connector> tag for the web (look for "8080") and insert this parameter assignment: URIEncoding="UTF-8" as explained in Tomcat 5 FAQ at http://tomcat.apache.org/faq/connectors.html#utf8

0

精彩评论

暂无评论...
验证码 换一张
取 消