Web Scraping (in R?)_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-01 07:13 出处：网络

I want to get the names of the companies in the middle column of this page (written in bold in blue), as well as the location indicator of the person who is registering the complaint 开发者_运维问答(e.g. "India, Delhi", written in green). Basically, I want a table (or data frame) with two columns, one for company, and the other for location. Any ideas?

You can easily do this using the XML package in R. Here is the code

url = "http://www.consumercomplaints.in/bysubcategory/mobile-service-providers/page/1.html"
doc = htmlTreeParse(url, useInternalNodes = T)

profiles = xpathSApply(doc, "//a[contains(@href, 'profile')]", xmlValue)
profiles = profiles[!(1:length(profiles) %% 2)]

states   = xpathSApply(doc, "//a[contains(@href, 'bystate')]", xmlValue)

This to match titles in blue bold, the trick is to open the source code of page and look what is before and after what are you looking for, then you use regex.

preg_match('/>[a-zA-Z0-9]+<\/a><\/h4><\/td>/', $str, $matches);
for($i = 0;$i<sizeof($matches);$i++)
 echo $matches[$i];

You may check this.

Web Scraping (in R?)

精彩评论

关注公众号

热门标签

图文推荐

Web Scraping (in R?)

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：