This is a small project in R that I am attempting to execute. I've scraped a few hundred html pages. I am able to use the reaHTMLTable function in the XML library with R to read the tables that I'm interested in. However I'm having trouble writing the for loop to loop through the directory, grab the table from each file and append them to a single CSV file.
I HAVE been successful in looping through the file开发者_开发百科s and saving each table to a single txt file (which I feel is at least a start):
library(XML) # htmlTreeParse
parentpath <- "Z:/scraping"
setwd(parentpath)
filenames <- list.files()
for (targetfile in filenames){
setwd(parentpath)
data = readHTMLTable(targetfile)
outputfile <- paste(targetfile,'.txt', sep="")
write.table (data[6], file = outputfile , sep = "\t", quote=TRUE)
Shouldn't the append=TRUE option in write.table do the trick for you? You can read about it by looking up ?write.table.
精彩评论