I'd like to select an element which has no children of a specific type, for example:
all <li>
elements who have no <table class="someclass">
children, I'd like to select only the parent element, not the children that don't match table.
On a similar note, I'd like to match elements whose parents don't match X, for example:
all <li>
elements who are not descendents of <table cla开发者_开发问答ss="someclass">
.
I'm using python, and lxml's cssselect.
Thanks!
The CSS3 :not
selector will get you partly there. Unfortunately, there is no parent selector so you can't select an element based on characteristics of its children.
For your first question you have to explicitly do the traversal:
# All <li> elements who have no <table class="someclass"> children
[e.getparent() for e in CSSSelector('li > table:not(.someclass)')(html)]
# To make it unique if there could be multiple acceptable child tables
set(e.getparent() for e in CSSSelector('li > table:not(.someclass)')(html))
# If there could be empty <li>
set(itertools.chain(
(e.getparent() for e in CSSSelector('li > table:not(.someclass)')(html)),
CSSSelector('li:empty')(html)
))
CSS selectors alone can handle your second question:
# All <li> elements who are not descendents of <table class="someclass">
CSSSelector(':not(table.someclass) li')(html)
I don't think CSS selectors have "anything but" selection, so you can't do it that way. Maybe you can do it with XPaths. which are more flexible, but even then you will get very complex and obtuse path expressions.
I'd recommend that you simply get all <li>
elements, go through each elemnts children, and skip it if one of the children is a table.
This will be easily understandable and maintainable, easy to implement, and unless your performance requirements are really extreme and you need to process tens of thousands of pages per second, it will be Fast Enough (tm).
Keep it simple.
精彩评论