开发者

Python regex help

开发者 https://www.devze.com 2023-01-23 18:42 出处:网络
a = Account(unit = 2, path=\'/real/os/win/today/axl.xls\', realname = \'st\') What I want is escape the \' to html entities, which is '开发者_JAVA百科
a = Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')

What I want is escape the ' to html entities, which is '

开发者_JAVA百科

remember, the string after path can be anything, I need a generic way to do this.

The output of this string is

Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')


re.sub(r"path=\'([^\']*)\'", "path='\1'", str)


If you want to convert '/real/os/win/today/axl.xls' to '/real/os/win/today/axl.xls' you can use "'/real/os/win/today/axl.xls'".replace("'", ''') instead of using regex.


What you have are non-HTML entities. If I remember it right, there are 3 such types of &... entities, e.x.-       all mean U+00A0 NO-BREAK SPACE.

  - (the type you have) is a "numeric character reference" (decimal).

  - is a "numeric character reference" (hexadecimal).

  - is an entity.

You could check out Fredrick Luth's Unescape HTML script (for python2.x) & more about HTML entities here


if i understood the question correctly:

>>> a = "Account(unit = 2, path='/real/os/win/today/axl.xls', realname = 'st')"
>>> re.sub("(?<=path=').*", lambda x: '&#39'+x.group(0), a)
"Account(unit = 2, path='&#39/real/os/win/today/axl.xls', realname = 'st')"


I prefer BeautifulSoup for all this stuff. Check out http://www.crummy.com/software/BeautifulSoup/documentation.html#Entity%20Conversion for more.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号