开发者

HTML Agility Pack vs Regular Expressions

开发者 https://www.devze.com 2022-12-19 14:32 出处:网络
If I am creating a simple web scraper (from root url, grab all links, then from those links grab all emails) would it be worthwhile to use HTML Agility Pack? I am not actually looking through HTML tag

If I am creating a simple web scraper (from root url, grab all links, then from those links grab all emails) would it be worthwhile to use HTML Agility Pack? I am not actually looking through HTML tags, I am simply looking to scan for emails within the entire document.

Would it be more efficient to use HTML agility pack?

I am stripping them strictly because it is necessary I have these emails, and ther开发者_JAVA百科e are about 100 links. Only about 500 emails will be scraped. No worries, I'm keeping ethics in mind here.


There are many question on SO about this - most of the ones I read say - don't use regular expressions for web scraping.

On the other hand - if all you want is text parsing regardless of the HTML nature of the text (which you do if I understand you correctly), it may be better to use regular expressions.

0

精彩评论

暂无评论...
验证码 换一张
取 消