Let's say I have a structure like this:
<folder name="folder1">
<folder name="folder2">
<bookmark href="link.html">
</folder>
</folder>
If I point to bookmark, what would be the command to just extract all of the folder lines? For example,
bookmarks = soup.findAll('bookmark')
开发者_如何学Python
then beautifulsoupcommand(bookmarks[0])
would return:
[<folder name="folder1">,<folder name="folder2">]
I'd also want to know when the ending tags hit too. Any ideas?
Thanks in advance!
Here is my stab at it:
>>> from BeautifulSoup import BeautifulSoup
>>> html = """<folder name="folder1">
<folder name="folder2">
<bookmark href="link.html">
</folder>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.find_all('bookmark')
>>> [p.get('name') for p in bookmarks[0].find_all_previous(name = 'folder')]
[u'folder2', u'folder1']
The key difference from @eumiro's answer is that I am using find_all_previous
instead of find_parents
. When I tested @eumiro's solution I found that find_parents
only returns the first (immediate) parent as the name of the parent and grandparent are the same.
>>> [p.get('name') for p in bookmarks[0].find_parents('folder')]
[u'folder2']
>>> [p.get('name') for p in bookmarks[0].find_parents()]
[u'folder2', None]
It does return two generations of parents if the parent and grandparent are differently named.
>>> html = """<folder name="folder1">
<folder_parent name="folder2">
<bookmark href="link.html">
</folder_parent>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.find_all('bookmark')
>>> [p.get('name') for p in bookmarks[0].find_parents()]
[u'folder2', u'folder1', None]
bookmarks[0].findParents('folder')
will return you a list of all parent nodes. You can then iterate over them and use their name
attribute.
精彩评论