RCurl or XML Challenge: Read Pastebin into R_问答_开发者

RCurl or XML Challenge: Read Pastebin into R

开发者 https://www.devze.com 2023-03-07 07:55 出处：网络

Flex your RCurl/XML muscle. Shortest code wins. Parse into R: http://pastebin.com/CDzYXNbG Data should be:

相关专题：

Flex your RCurl/XML muscle. Shortest code wins. Parse into R: http://pastebin.com/CDzYXNbG

Data should be:

structure(list(Treatment = structur开发者_Python百科e(c(2L, 2L, 1L, 1L), .Label = c("C", 
"T"), class = "factor"), Gender = c("M", "F", "M", "F"), Response = c(56L, 
58L, 6L, 63L)), .Names = c("Treatment", "Gender", "Response"), row.names = c(NA, 
-4L), class = "data.frame")

Good luck!

Note: dummy data kindly provided by this question: Adding space between bars in ggplot2

Same idea as kohske but slightly shorter and more clear I think

library(XML)
eval(parse(text=gsub('\r\n','\n',xpathApply(htmlTreeParse('http://pastebin.com/CDzYXNbG',useInternal=T),'//textarea',xmlValue))))

You guys are making this way too hard:

eval(parse(file("http://pastebin.com/raw.php?i=CDzYXNbG")))

OK, so I cheated. But starting from the same URL you could get the same end:

eval(parse(file(paste("http://pastebin.com/raw.php?i=", strsplit("http://pastebin.com/CDzYXNbG", "/")[[1]][4], sep=""))))

Which still puts me in the lead :)

RCurl is not necessary for my code, since XML packages can parse URL for file argument.

Please execute

library(XML)

before the examples below.

Code 1 is oneliner:

eval(parse(text=htmlTreeParse("http://pastebin.com/CDzYXNbG",handlers=(function(){qt <- NULL;list(textarea=function(node,...){qt<<-gsub("[\r\n]", "", unclass(node$children$text)$value);node},.qt=function()qt)})())$.qt()))

Code 2 is shorter, but I think this is not shortest.

htmlTreeParse("http://pastebin.com/CDzYXNbG",h=list(textarea=function(n)z<<-gsub("[\r\n]","",unclass(n$c$t)$v)));eval(parse(text=z))

As this question is a kind of game, please decrypt this code.

UPDATED

After looking at @JD Long's excellent solution, here is a shortest code:

eval(parse(file(sub("m/","m/raw.php?i=","http://pastebin.com/CDzYXNbG"))))

Now question is how to make a desired url string in the shortest code ;-p

Updated again. This is shorter by some characters.

source(sub("m/","m/raw.php?i=","http://pastebin.com/CDzYXNbG"))$va

I'm not perfectly sure what you are trying to achieve here, but maybe does what you ask for (not using any fancy packages, just regex):

fullText<-(paste(readLines("http://pastebin.com/CDzYXNbG"), collapse="\n"))
regexp<-"<textarea[^>]*id=\"paste_code\"[^>]*>(.*)</textarea>"
txtarpos<-regexpr(regexp, fullText)
txtarstrt<-txtarpos[1]
txtarlen<-unlist(attributes(txtarpos)["match.length"])
txtarstp<-txtarstrt+txtarlen
txtarpart<-substr(fullText, txtarpos[1], txtarstp)
retval<-gsub("\n", "", gsub("&quot;", "\"", gsub(regexp, "\\1", txtarpart), fixed=TRUE), fixed=TRUE)
cat(retval)

I'm also pretty sure this can be improved upon somewhat, but it does the job I think you asked for. Even if doesn't: thanks for making me want to refresh my regex basics!