开发者_开发问答I have a R program that combines 10 files each file is of size 296MB and I have increased the memory size to 8GB (Size of RAM)
--max-mem-size=8192M
and when I ran this program I got a error saying
In type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
Reached total allocation of 7646Mb: see help(memory.size)
Here is my R program
S1 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1_400.txt");
S2 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_401_800.txt");
S3 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_801_1200.txt");
S4 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1201_1600.txt");
S5 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1601_2000.txt");
S6 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2001_2400.txt");
S7 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2401_2800.txt");
S8 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2801_3200.txt");
S9 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3201_3600.txt");
S10 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3601_4000.txt");
options(max.print=154.8E10);
combine_result <- rbind(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
write.table(combine_result,file="C:/sim_omega3_1_4000.txt",sep=";",
row.names=FALSE,col.names=TRUE, quote = FALSE);
Can anyone, help me with this
Thanks,
Shruti.
I suggest incorporating the suggestions in ?read.csv2
:
Memory usage:
These functions can use a surprising amount of memory when reading large files. There is extensive discussion in the ‘R Data Import/Export’ manual, supplementing the notes here. Less memory will be used if ‘colClasses’ is specified as one of the six atomic vector classes. This can be particularly so when reading a column that takes many distinct numeric values, as storing each distinct value as a character string can take up to 14 times as much memory as storing it as an integer. Using ‘nrows’, even as a mild over-estimate, will help memory usage. Using ‘comment.char = ""’ will be appreciably faster than the ‘read.table’ default. ‘read.table’ is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use ‘scan’ instead for matrices.
Memory allocation needs contiguous blocks. The size taken by the file on disk may not be a good index of how large the object is when loaded into R. Can you look at one of these S files with the function:
?object.size
Here is a function I use to see what is taking up the most space in R:
getsizes <- function() {z <- sapply(ls(envir=globalenv()),
function(x) object.size(get(x)))
(tmp <- as.matrix(rev(sort(z))[1:10]))}
If you remove(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
then gc()
after calculating combine_result, you might free enough memory. I also find that running it through RScript seems to allows access to more memory than through the GUI if you are on Windows.
If this files are in standard format and you want to do this in R then why bother read/write csv. Use readLines
/writeLines
:
files_in <- file.path("C:/Sim_Omega3_results",c(
"sim_omega3_1_400.txt",
"sim_omega3_401_800.txt",
"sim_omega3_801_1200.txt",
"sim_omega3_1201_1600.txt",
"sim_omega3_1601_2000.txt",
"sim_omega3_2001_2400.txt",
"sim_omega3_2401_2800.txt",
"sim_omega3_2801_3200.txt",
"sim_omega3_3201_3600.txt",
"sim_omega3_3601_4000.txt"))
file.copy(files_in[1], out_file_name <- "C:/sim_omega3_1_4000.txt")
file_out <- file(out_file_name, "at")
for (file_in in files_in[-1]) {
x <- readLines(file_in)
writeLines(x[-1], file_out)
}
close(file_out)
精彩评论