开发者

RTF CP1252 to Text UTF-8

开发者 https://www.devze.com 2023-02-13 11:48 出处:网络
Here is a file that I need to convert to plain text in MAC OSX zshell. http://narod.ru/disk/6431540001/Test_rtf.rtf.html

Here is a file that I need to convert to plain text in MAC OSX zshell. http://narod.ru/disk/6431540001/Test_rtf.rtf.html

I've tried unrtf, rtf2txt, rtf2html = no result. They can't convert ru_cp1252. Also I've tried

unrtf file.rtf | iconv -f cp1252 -t UTF-8 No result.

I'll be happy with any solution: shell/perl/python/ruby

If you dont want to download the file there is a part of the rtf file as I see it in zshell with cat:

{\rtf1\adeflang1025\ansi\ansicpg10000\uc1\adeff0\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1033\deflangfe1033\themelang1033\themela     ngfe0\themelangcs0{\fonttbl{\f0\fbidi \fnil\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fbidi \fnil\fcharset0\fprq2{\*\     panose 020b0604020202020204}Arial;}^M{\f1\fbidi \fnil\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}{\flomajor\f31500\fbidi \fnil\fchars     et0\fprq2{\*\panose 020b0604020202020204}Arial;}{\fdbmajor\f31501\fbidi \fnil\fcharset78\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e     ;}^M{\fhimajor\f31502\fbidi \fnil\fcharset0\fprq2{\*\panose 020f0502020204030204}Calibri;}{\fbimajor\f31503\fbidi \fnil\fcharset0\fprq2{\*\panos     e 02020603050405020304}Times New Roman;}^M{\flominor\f31504\fbidi \fnil\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\fdbmin     or\f31505\fbidi \fnil\fcharset78\fprq2 \'82\'6c\'82\'72 \'96\'be\'92\'a9;}^M{\fhiminor\f31506\fbidi \fnil\fcharset0\fprq2{\*\panose 020405030504     06030204}Cambria;}{\fbiminor\f31507\fbidi \fnil\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f487\fbidi \fnil\fcharset238\f     prq2 Times New Roman CE;}^M{\f488\fbidi \fnil\fcharset204\fprq2 Times New Roman Cyr;}{\f490\fbidi \fnil\fcharset161\fprq2 Times New Roman Greek;     }{\f491\fbidi \fnil\fcharset162\fprq2 Times New Roman Tur;}{\f492\fbidi \fnil\fcharset177\fprq2 Times New Roman (Hebrew);}^M{\f493\fbidi \fnil\f     charset178\fprq2 Times New Roman (Arabid);}{\f494\fbidi \fnil\fcharset186\fprq2 Times New Roman Baltic;}{\f495\fbidi \fnil\fcharset87\fprq2 Time     s New Roman (That);}{\f497\fbidi \fnil\fcharset238\fprq2 Arial CE;}^M{\f498\fbidi \fnil\fcharset204\fprq2 Arial Cyr;}{\f500\fbidi \fnil\fcharset     161\fprq2 Arial Greek;}{\f501\fbidi \fnil\fcharset162\fprq2 Arial Tur;}{\f502\fbidi \fnil\fcharset177\fprq2 Arial (Hebrew);}{\f503\fbidi \fnil\f     charset178\fprq2 Arial (Arabid);}^M{\f504\fbidi \fnil\fcharset186\fprq2 Arial Baltic;}{\f505\fbidi \fnil\fcharset87\fprq2 Arial (That);}{\f497\f     bidi \fnil\fcharset238\fprq2 Arial CE;}{\f498\fbidi \fnil\fcharset204\fprq2 Arial Cyr;}{\f500\fbidi \fnil\fcharset161\fprq2 Arial Greek;}^M{\f50     1\fbidi \fnil\fcharset162\fprq2 Arial Tur;}{\f502\fbidi \fnil\fcharset177\fprq2 Arial (Hebrew);}{\f503\fbidi \fnil\fcharset178\fprq2 Arial (Arab     id);}{\f504\fbidi \fnil\fcharset186\fprq2 Arial Baltic;}{\f505\fbidi \fnil\fcharset87\fprq2 Arial (That);}^M{\flomajor\f31508\fbidi \fnil\fchars     et238\fprq2 Arial CE;}{\flomajor\f31509\fbidi \fnil\fcharset204\fprq2 Arial Cyr;}{\flomajor\f31511\fbidi \fnil\fcharset161\fprq2 Arial Greek;}{\     flomajor\f31512\fbidi \fnil\fcharset162\fprq2 Arial Tur;}^M{\flomajor\f31513\fbidi \fnil\fcharset177\fprq2 Arial (Hebrew);}{\flomajor\f31514\fbi     di \fnil\fcharset178\fprq2 Arial (Arabid);}{\flomajor\f31515\fbidi \fnil\fcharset186\fprq2 Arial Baltic;}{\flomajor\f31516\fbidi \fnil\fcharset8     7\fprq2 Arial (That);}^M{\fdbmajor\f31520\fbidi \fnil\fcharset0\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Western;}{\fdbmajor\f315     18\fbidi \fnil\fcharset238\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e CE;}^M{\fdbmajor\f31519\fbidi \fnil\fcharset204\fprq2 \'82\'6     c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Cyr;}{\fdbmajor\f31521\fbidi \fnil\fcharset161\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4     e Greek;}^M{\fdbmajor\f31522\fbidi \fnil\fcharset162\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Tur;}{\fdbmajor\f31525\fbidi \fnil\     fcharset186\fprq2 \'82\'6c\'82\'72 \'83\'53\'83\'56\'83\'62\'83\'4e Baltic;}^M{\fhimajor\f31528\fbidi \fnil\fcharset238\fprq2 Calibri CE;}{\fhim     aj开发者_JAVA百科or\f31529\fbidi \fnil\fcharset204\fprq2 Calibri Cyr;}{\fhimajor\f31531\fbidi \fnil\fcharset161\fprq2 Calibri Greek;}{\fhimajor\f31532\fbidi \f     nil\fcharset162\fprq2 Calibri Tur;}^M{\fhimajor\f31535\fbidi \fnil\fcharset186\fprq2 Calibri Baltic;}{\fhimajor\f31536\fbidi \fnil\fcharset87\fp     rq2 Calibri (That);}{\fbimajor\f31538\fbidi \fnil\fcharset238\fprq2 Times New Roman CE;}^M{\fbimajor\f31539\fbidi \fnil\fcharset204\fprq2 Times      New Roman Cyr;}{\fbimajor\f31541\fbidi \fnil\fcharset161\fprq2 Times New Roman Greek;}{\fbimajor\f31542\fbidi \fnil\fcharset162\fprq2 Times New      Roman Tur;}^M{\fbimajor\f31543\fbidi \fnil\fcharset177\fprq2 Times New Roman (Hebrew);}{\fbimajor\f31544\fbidi \fnil\fcharset178\fprq2 Times New      Roman (Arabid);}{\fbimajor\f31545\fbidi \fnil\fcharset186\fprq2 Times New Roman Baltic;}^M{\fbimajor\f31546\fbidi \fnil\fcharset87\fprq2 Times      New Roman (That);}{\flominor\f31548\fbidi \fnil\fcharset238\fprq2 Times New Roman CE;}{\flominor\f31549\fbidi \fnil\fcharset204\fprq2 Times New      Roman Cyr;}^M{\flominor\f31551\fbidi \fnil\fcharset161\fprq2 Times New Roman Greek;}{\flominor\f31552\fbidi \fnil\fcharset162\fprq2 Times New Ro     man Tur;}{\flominor\f31553\fbidi \fnil\fcharset177\fprq2 Times New Roman (Hebrew);}^M{\flominor\f31554\fbidi \fnil\fcharset178\fprq2 Times New R     oman (Arabid);}{\flominor\f31555\fbidi \fnil\fcharset186\fprq2 Times New Roman Baltic;}{\flominor\f31556\fbidi \fnil\fcharset87\fprq2 Times New      Roman (That);}^M{\fdbminor\f31560\fbidi \fnil\fcharset0\fprq2 \'82\'6c\'82\'72 \'96\'be\'92\'a9 Western;}{\fdbminor\f31558\fbidi \fnil\fcharset2 ...................... }


my tip is to use a text editor that can handle different charsets. Open the file and than store as UTF-8.

I often use jEdit for kind of similar tasks. see jEdit's manual about Character Encodings

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号