I'm writing a bash script that needs to parse html that includes special characters such as @!'ó
. Currently I have the entire script running and it ignores or trips on these queries because they're returned from the server as decimal unicode like this: '
. I've figured out how to parse and convert to hexadecimal and load these into python to convert them back to their symbols and I am wondering if bash can do this final conversion natively. Simple example in python:
print ur"\u0032" ur"\u0033" ur"\u0040"
prints out
23@
Can I achieve the same result in Bash? I've looked into iconv but I don't think it can do what I want, or more probably I just don't know how.
Here's some relevant information:
Python String Literals
Hex to UTF conversion in Python
And here are some examples of expected input-output.
Ludwig van Beethoven - 5th Symphony and 6th Symphony ''Pastoral'' - Boston Symphony Orchestra - Charles Munch
Ludwig van Beethoven - 5th Symphony and 6th Symphony ''Pastoral'' - Boston Symphony Orchestra - Charles Munch
АлисА (Alisa) - Мы &开发者_如何学Go;#1074;месте. ХХ лет (My vmeste XX let)
АлисА (Alisa) - Мы вместе. ХХ лет (My vmeste XX let)
The printf
builtin in Bash doesn't support Unicode codes, but the external printf
(at least on my GNU-based system) does:
$ /usr/bin/printf "\u0410\u043b\u0438\u0441\u0410"
АлисА
or this, which selects printf
from your path in case it's not in /usr/bin
:
$ $(type -P printf) "\u0410\u043b\u0438\u0441\u0410"
АлисА
or
$ env printf "\u0410\u043b\u0438\u0441\u0410"
АлисА
possible solution, e.g.:
$ function conv() { echo $* | python -c 'import re, sys; print re.sub(r"&#(\d+);", lambda x: unichr(int(x.group(1))), sys.stdin.read()).rstrip()' ; }
$ conv 'АлисА (Alisa)'
АлисА (Alisa)
精彩评论