开发者

Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

开发者 https://www.devze.com 2023-02-02 21:37 出处:网络
Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

I'd like to figure out a way of extracting links that are in the body of text.

1.) I use readability in python https://github.com/gfxmonk/python-readability

2.) I'd li开发者_高级运维ke to somehow compare the extracted text to the original html text in order to extract links in the actual body of an article.


Well, it looks like it returns a BeautifulSoup tree. So you should be able to do something like:

article = page.summary()   # Extract article using readability
article.findAll("a")       # Return a list of all links in the article
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号