开发者

Extracting html tables from website

开发者 https://www.devze.com 2023-03-04 16:40 出处:网络
I am trying to use XML, RCurl package to read some html tables of the following URL http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#

I am trying to use XML, RCurl package to read some html tables of the following URL http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#

Here is the code I am using

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#"
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
## Required tables 
tmp[[13]]
tmp[[14]]

If you look at t开发者_开发百科he tables it has not been able to parse the values from the webpage. I guess this due to some javascipt evaluation happening on the fly. Now if I use "save page as" option in google chrome(it does not work in mozilla) and save the page and then use the above code i am able to read in the values.

But is there a work around so that I can read the table of the fly ? It will be great if you can help.

Regards,


Looks like they're building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing out some string. Maybe you could grab that data and parse it out instead of scraping the page itself.

Looks like you'll have to build a request with the proper referrer headers using cURL, though. As you can see, you can't just hit that ajaxGetQuote page with a bare request.

You can probably read the appropriate headers to put in by using the Web Inspector in Chrome or Safari, or by using Firebug in Firefox.

0

精彩评论

暂无评论...
验证码 换一张
取 消