开发者

Python Regex returns me the value with parentheses

开发者 https://www.devze.com 2023-03-20 20:48 出处:网络
I\'m trying to run this code: picture = re.search(\"#4F9EFF;\\\"><img src=\\\"(.+?)\\\" wid开发者_JAVA技巧th=\\\"120\\\" height=\\\"90\\\"\", data)

I'm trying to run this code:

picture = re.search("#4F9EFF;\"><img src=\"(.+?)\" wid开发者_JAVA技巧th=\"120\" height=\"90\"", data)

and when i do print picture.groups(1) it returns me the value but with parentheses, why?

Output:

('http://sample.com/img/file.jpg',)


The group is a tuple containing one element. You can access the string (which is the first match) as output[0]. The important part is the comma after the string.

BUT

DON'T PARSE HTML WITH REGEX

You should use a proper HTML parser. This will save you innumerable headaches in the future, when your regex fails to match or gets too much. Look into BeautifulSoup or lxml.


Notice the comma before the closing parenthesis? This is a tuple (albeit one with just one element in it).

As the documentation for MatchObject.groups() says:

groups([default])

Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.

As noted by other posters, you want to use MatchObject.group() instead.


You should be using

picture.group(1)

not groups() in plural if you're only looking for one specific group. groups() always returns a tuple, group() is the one you're looking for.


groups() returns a tuple of all the groups. You want pictures.group(1) which returns the string that matched group 1.


As the groups help says is returns "a tuple containing all the subgroups of the match". If you want a single group use the group method.

0

精彩评论

暂无评论...
验证码 换一张
取 消