When I use the unicode
function in BeautifulSoup - what encoding does it convert to Unicode from? Does it automatically use the soup.originalEncoding
?
from BeautifulSoup import BeautifulSoup
doc = "<html><h1>Heading</h1><p>Text"
soup = Be开发者_StackOverflow中文版autifulSoup(doc)
print unicode(soup)
Thanks
unicode()
is a Python builtin, not part of BeautifulSoup. See the docs here.
unicode([object[, encoding[, errors]]])
If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, while a value of 'ignore' causes errors to be silently ignored, and a value of 'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded. See also the codecs module.
If you don't specify the encoding, sys.getdefaultencoding()
will be used by default.
精彩评论