im parsing html using BeautifulSoup in python
i dont know how to insert a space when extracting text element
this is the code:
import BeautifulSoup
soup=BeautifulSoup.BeautifulSoup('<html>this<b>is</b>example</html>')
print soup.text
then output is
开发者_Go百科thisisexample
but i want to insert a space to this like
yes is example
how do i insert a space?
Use getText
instead:
import BeautifulSoup
soup=BeautifulSoup.BeautifulSoup('<html>this<b>is</b>example</html>')
print soup.getText(separator=u' ')
# u'this is example'
If your version of Beautifulsoup does not have getText
then you could do this:
In [26]: ' '.join(soup.findAll(text=True))
Out[26]: u'this is example'
One may want to use also with strip argument
bs = BeautifulSoup("<html>this<b>is </b>example</html>")
print(bs.get_text()) # thisis example
print(bs.get_text(separator=" ")) # this is example
print(bs.get_text(separator=" ", strip=True)) # this is example
精彩评论