开发者

Python regex: Fix one html close tag

开发者 https://www.devze.com 2023-01-27 21:57 出处:网络
<div>random contents without < or > , but has ( )<div> Just need to fix the closing div tag
<div>random contents without < or > , but has ( )  <div>

Just need to fix the closing div tag

so it looks like <div>random contents</div>

开发者_StackOverflow社区

I need to do it in Python by regex.

The input is exact like the first line, there will no any < or > in random contents


replace

(<div>[^<]*<)(div>)

with

$1/$2

Note: This is bad practice, don't do it unless it's absolutely necessary!


I wouldn't recommend a regex - use something like tidy (which is a Python wrapper around HTML Tidy).


Avoid using regular expressions for dealing with HTML.

This is how it would be parsed in a DOM tree as it currently is:

>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup('<div>random contents<div>')
<div>random contents<div></div></div>

Or are you wanting to turn the second <div> into </div> (which a browser certainly would not do)?

0

精彩评论

暂无评论...
验证码 换一张
取 消