Match "without this"_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-10 21:47 出处：网络

I need to rem开发者_JS百科ove all that are only \'s in <td>. But how it can be done?

相关专题：python regex

I need to rem开发者_JS百科ove all  that are only 's in <td>.

But how it can be done?

import re
text = """
    <td><p>111</p></td>
    <td><p>111</p><p>222</p></td>
    """
text = re.sub(r'<td><p>(??no</p>inside??)</p></td>', r'<td>\1</td>', text)

How can I match withoutinside?

I would use minidom. I stole the following snippet from here which you should be able to modify and work for you:

from xml.dom import minidom

doc = minidom.parse(myXmlFile)
for element in doc.getElementsByTagName('MyElementName'):
    if element.getAttribute('name') in ['AttrName1', 'AttrName2']:
        parentNode = element.parentNode
        parentNode.insertBefore(doc.createComment(element.toxml()), element)
        parentNode.removeChild(element)
f = open(myXmlFile, "w")
f.write(doc.toxml())
f.close()

Thanks @Ivo Bosticky

While using regexps with HTML is bad, matching a string that does not contain a given pattern is an interesting question in itself.

Let's assume that we want to match a string beginning with an a and ending with a z and take out whatever is in between only when string bar is not found inside.

Here's my take: "a((?:(?<!ba)r|[^r])+)z"

It basically says: find a, then find either an r which is not preceded by ba, or something different than r (repeat at least once), then find a z. So, a bar cannot sneak in into the catch group.

Note that this approach uses a 'negative lookbehind' pattern and only works with lookbehind patterns of fixed length (like ba).

I would definitely recommend using BeautifulSoup for this. It's a python HTML/XML parser.

http://www.crummy.com/software/BeautifulSoup/

Not quite sure why you want to remove the P tags which don't have closing tags. However, if this is an attempt to clean code, an advantage of BeautifulSoup is that is can clean HTML for you:

from BeautifulSoup import BeautifulSoup
html = """
<td><p>111</td>
<td><p>111<p>222</p></td>
"""
soup = BeautifulSoup(html)
print soup.prettify()

this doesn't get rid of your unmatched tags, but it fixes the missing ones.

Match "without this"

精彩评论

关注公众号

热门标签

图文推荐

Match "without this"

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：