I've tried something like this
file_in <- file("myfile.log","r")
x <- readLines(file_in, n=-开发者_如何转开发100)
but I'm still waiting...
Any help would be greatly appreciated
I'd use scan
for this, in case you know how many lines the log has :
scan("foo.txt",sep="\n",what="char(0)",skip=100)
If you have no clue how many you need to skip, you have no choice but to move towards either
- reading in everything and taking the last n lines (in case that's feasible),
- using
scan("foo.txt",sep="\n",what=list(NULL))
to figure out how many records there are, or - using some algorithm to go through the file, keeping only the last n lines every time
The last option could look like :
ReadLastLines <- function(x,n,...){
con <- file(x)
open(con)
out <- scan(con,n,what="char(0)",sep="\n",quiet=TRUE,...)
while(TRUE){
tmp <- scan(con,1,what="char(0)",sep="\n",quiet=TRUE)
if(length(tmp)==0) {close(con) ; break }
out <- c(out[-1],tmp)
}
out
}
allowing :
ReadLastLines("foo.txt",100)
or
ReadLastLines("foo.txt",100,skip=1e+7)
in case you know you have more than 10 million lines. This can save on the reading time when you start having extremely big logs.
EDIT : In fact, I'd not even use R for this, given the size of your file. On Unix, you can use the tail command. There is a windows version for that as well, somewhere in a toolkit. I didn't try that out yet though.
You could do this with read.table
by specifying the skip
parameter. If your lines are not to be parsed to variables, specify the separator to be '\n'
as @Joris Meys pointed out below, and also set as.is=TRUE
to get character vectors instead of factors.
Small example (skipping the first 2000 lines):
df <- read.table('foo.txt', sep='\n', as.is=TRUE, skip=2000)
As @JorisMeys already mentioned the unix command tail
would be the easiest way to solve this problem. However I want to propose a seek
based R
solution that starts reading the file from the end of the file:
tailfile <- function(file, n) {
bufferSize <- 1024L
size <- file.info(file)$size
if (size < bufferSize) {
bufferSize <- size
}
pos <- size - bufferSize
text <- character()
k <- 0L
f <- file(file, "rb")
on.exit(close(f))
while(TRUE) {
seek(f, where=pos)
chars <- readChar(f, nchars=bufferSize)
k <- k + length(gregexpr(pattern="\\n", text=chars)[[1L]])
text <- paste0(text, chars)
if (k > n || pos == 0L) {
break
}
pos <- max(pos-bufferSize, 0L)
}
tail(strsplit(text, "\\n")[[1L]], n)
}
tailfile(file, n=100)
You can read last n lines by following method
Step 1 - Open your file as your wish
df <- read.csv("hw1_data.csv")
Step 2 - Now use tail
function to read n lines from last
tail(df, 2)
Some folks have said it already, but if you have a large log, it is most efficient to only read in what you need instead of reading it all into memory, then subsetting what you need.
For this, we use R's system()
to run the Linux tail
command.
Read the last 10 lines of the log:
system("tail path/to/my_file.log")
Read the last 2 lines of the log:
system("tail -n 2 path/to/my_file.log")
Read the last 2 lines of the log and capture the output in a character vector:
last_2_lines <- system("tail -n 2 path/to/my_file.log", intern = TRUE)
For seeing the last few lines:
tail(file_in,100)
精彩评论