I have several txt files with 3 columns in each files like this: file 1:
ProbeID X_Signal_intensity X_P-Value
xxx 2.34 .89
xxx 6.45 .04
xxx 1.09 .91
xxx 5.87 .70
. . .
. . .
. . .
file 2:
ProbeID Y_Signal_intensity Y_P-Value
xxx 1.4 .92
xxx 2.55 .14
xxx 4.19 .16
xxx 3.47 .80
. . .
. . .
. . .
file 3:
ProbeID Z_Signal_intensity Z_P-Value
xxx 9.40 .82
xxx 1.55 .04
xxx 3.19 .56
xxx 2.47 .90
. . 开发者_如何学Python .
. . .
. . .
In all the above files the values of ProbeID column are identical but not the other columns.Now I want to combine the all the above files using a for-loop into a single file like this:
ProbeID X_intensity X_P-Value Y_intensity Y_P-Value Z_intensity Z_P-Value
xxx 2.34 .89 1.4 .92 9.40 .82
xxx 6.45 .04 2.55 .14 1.55 .04
xxx 1.09 .91 4.19 .16 3.19 .56
xxx 5.87 .70 3.47 .80 2.47 .90
Please do help me.
Read in the files as given by Richie Cotton, but make sure you add the appropriate extra arguments in the apply call. For one, header=TRUE
should probably be added.
file.names <- c("file X.txt", "file Y.txt", "file Z.txt")
file.list <- lapply(file.names, read.table, header=TRUE)
Then you'll probably need a merge_recurse
from the reshape package
:
require(reshape)
mynewframe <- merge_recurse(file.list,all.x=TRUE,all.y=TRUE,by="ProbeID")
This will work for any given number of dataframes, provided it's not a billion of them. For more information on the arguments used, see the help page of ?merge
.
CORRECTION : in merge_recurse, you have to use all.x
and all.y
as shown in the correction above. You can't just use the shortcut all
or you'll get errors.
Small demonstration :
X2 <- data.frame(ProbeID=(2:4),Z2=4:6)
X1 <- data.frame(ProbeID=1:3,Z1=1:3)
X3 <- data.frame(ProbeID=1:3,Z3=7:9)
file.list <- list(X1,X2,X3)
mynewframe <- merge_recurse(file.list,all.x=TRUE,all.y=TRUE,by="ProbeID")
> mynewframe
ProbeID Z1 Z2 Z3
1 1 1 NA 7
2 2 2 4 8
3 3 3 5 9
4 4 NA 6 NA
Read in your files
filenames <- c("file X.txt", "file Y.txt", "file Z.txt")
data_list <- lapply(filenames, read.table)
Combine them into one big data frame
all_data <- do.call(cbind, data_list)
all_data <- do.call(merge, data_list, by = "ProbeID")
This gives a good lesson to "always concentrate when providing an answer". cbind
isn't smart enough to do ID matching, and merge
isn't smart enough to handle more than two data frames. Take a look at Joris's answer and use merge_recurse
instead. Or forget what you thought you wanted and use my other answer below.
Actually, a better idea, rather than having many columns would be to have just 4 columns: ProbeID, Signal_intensity, P_value and Source_file.
data_list <- lapply(data_list, function(x) {
colnames(x) <- c("ProbeID", "Signal_intensity", "P_value")
x
})
all_data <- do.call(rbind, data_list)
all_data$Source_file <- rep(filenames, times = sapply(data_list, nrow))
My approach is to read the files into data.frames
see help(read.delim)
for reading modes.
After you have your three data.frames you can use
total <- merge(dataframeA,dataframeB,by="ProbeID")
look here http://www.statmethods.net/management/merging.html for documentation.
I am going to throw another approach into the mix which uses Reduce
Reduce(function(...) merge(..., all = T), file.list)
精彩评论