开发者

What's the easiest way to extract the links on a web page using python without BeautifulSoup?

开发者 https://www.devze.com 2023-01-30 05:20 出处:网络
I\'m using cygwin 开发者_开发百科and do not have BeautifulSoup installed.Getting the value of href attributes in all <a> tags on a html file with Python

I'm using cygwin 开发者_开发百科and do not have BeautifulSoup installed.


Getting the value of href attributes in all <a> tags on a html file with Python

python, regex to find anchor link html

Regular expression to extract URL from an HTML link


If you don't care much about performance you can use regular expressions:

import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)

If you just want links like in http:// links then change the expression to:

linkre = re.compile(r"""href=["']http:([^"']+)["']""")

Or you can put "' as optional if by some chance you have html without them around the links.

0

精彩评论

暂无评论...
验证码 换一张
取 消