If I am creating a simple web scraper (from root url, grab all links, then from those links grab all emails) would it be worthwhile to use HTML Agility Pack? I am not actually looking through HTML tags, I am simply looking to scan for emails within the entire document.
Would it be more efficient to use HTML agility pack?
I am stripping them strictly because it is necessary I have these emails, and ther开发者_JAVA百科e are about 100 links. Only about 500 emails will be scraped. No worries, I'm keeping ethics in mind here.
There are many question on SO about this - most of the ones I read say - don't use regular expressions for web scraping.
On the other hand - if all you want is text parsing regardless of the HTML nature of the text (which you do if I understand you correctly), it may be better to use regular expressions.
精彩评论