Manipulating multiple files in R_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-14 02:35 出处：网络

I am new to R and am looking for a code to manipulate hundreds of files that I have at hand. They are .txt files with a few rows of unwanted text, followed by columns of data, looking something like t

相关专题：

XXXXX 
XXXXX
XXXXX
Col1 Col2 Col3 Col4 Col5
1 36 37 35 36 
2 34 34 36 37 
. 
. 
1500 34 35 36 35

I wrote a code (below) to extract selected rows of columns 1 and 5 of an individual .txt file, and would like to do a loop for all the files that I have.

data <- read.table(paste("/Users/tan/Desktop/test/01.txt"), skip =264, nrows = 932)
selcol<-c("V1", "V5")
write.table(data[selcol], file="/Users/tan/Desktop/test/01ed.txt", sep="\t")

With the above code, the .txt file now looks like thi开发者_如何学Gos:

Col1 Col5  
300 34  
. 
. 
700 34

If possible, I would like to combine all the Col5 of the .txt files with one of Column 1 (which is the same for all txt files), so that it looks something like this:

Col1 Col5a Col5b Col5c Col5d ...
300 34 34 36 37
. 
. 
700 34 34 36 37

Thank you! Tan

Alright - I think I hit on all your questions here, but let me know if I missed something. The general process that we will go through here is:

Identify all of the files that we want to read in and process in our working directory
Use lapply to iterate over each of those file names to create a single list object that contains all of the data
Select your columns of interest
Merge them together by the common column

For the purposes of the example, consider I have four files named file1.txt through file4.txt that all look like this:

    x           y          y2
1   1  2.44281173 -2.32777987
2   2 -0.32999022 -0.60991623
3   3  0.74954561  0.03761497
4   4 -0.44374491 -1.65062852
5   5  0.79140012  0.40717932
6   6 -0.38517329 -0.64859906
7   7  0.92959219 -1.27056731
8   8  0.47004041  2.52418636
9   9 -0.73437337  0.47071120
10 10  0.48385902  1.37193941

##1. identify files to read in
filesToProcess <- dir(pattern = "file.*\\.txt$")
> filesToProcess
[1] "file1.txt" "file2.txt" "file3.txt" "file4.txt"


##2. Iterate over each of those file names with lapply
listOfFiles <- lapply(filesToProcess, function(x) read.table(x, header = TRUE))

##3. Select columns x and y2 from each of the objects in our list
listOfFiles <- lapply(listOfFiles, function(z) z[c("x", "y2")])

##NOTE: you can combine steps 2 and 3 by passing in the colClasses parameter to read.table.
#That code would be:
listOfFiles <- lapply(filesToProcess, function(x) read.table(x, header = TRUE
  , colClasses = c("integer","NULL","numeric")))

##4. Merge all of the objects in the list together with Reduce. 
# x is the common columns to join on
out <- Reduce(function(x,y) {merge(x,y, by = "x")}, listOfFiles)
#clean up the column names
colnames(out) <- c("x", sub("\\.txt", "", filesToProcess))

Results in the following:

> out
    x       file1        file2       file3        file4
1   1 -2.32777987 -0.671934857 -2.32777987 -0.671934857
2   2 -0.60991623 -0.822505224 -0.60991623 -0.822505224
3   3  0.03761497  0.049694686  0.03761497  0.049694686
4   4 -1.65062852 -1.173863215 -1.65062852 -1.173863215
5   5  0.40717932  1.189763270  0.40717932  1.189763270
6   6 -0.64859906  0.610462808 -0.64859906  0.610462808
7   7 -1.27056731  0.928107752 -1.27056731  0.928107752
8   8  2.52418636 -0.856625895  2.52418636 -0.856625895
9   9  0.47071120 -1.290480033  0.47071120 -1.290480033
10 10  1.37193941 -0.235659079  1.37193941 -0.235659079