I have a lis开发者_StackOverflow中文版t of lists resulting from a bigsplit() operation (from package biganalytics, part of the bigmemory packages).
Each list represents a column in a matrix, and each list item is an index to a value of 1 in a binary matrix.
What is the best way to turn this list into a sparse binary (0/1) matrix? Is using lapply() within an lapply() the only solution? How do I keep the factors naming the lists as names for the columns?
You can do this without an lapply whatsoever if you need a matrix.
Say you have a list constructed like this :
Test <- list(
col1=list(2,4,7),
col2=list(3,2,6,8),
col3=list(1,4,5,3,7)
)
First you construct a matrix with zeros of the correct dimensions. If you know them beforehand, that's easy. Otherwise you can derive easily:
n.cols <- length(Test)
n.ids <- sapply(Test,length)
n.rows <- max(unlist(Test))
out <- matrix(0,nrow=n.rows,ncol=n.cols)
Then you use the fact that matrices are filled columnwise to calculate the index of each cell that has to become one :
id <- unlist(Test)+rep(0:(n.cols-1),n.ids)*n.rows
out[id] <- 1
colnames(out) <- names(Test)
This gives :
> out
col1 col2 col3
[1,] 0 0 1
[2,] 1 1 0
[3,] 0 1 1
[4,] 1 0 1
[5,] 0 0 1
[6,] 0 1 0
[7,] 1 0 1
[8,] 0 1 0
You might also consider using the Matrix package which deals with large sparse matrices in a more efficient way than base R. You can build a sparse matrix of 0s and 1s by describing which rows and columns should be 1s.
library(Matrix)
Test <- list(
col1=list(2,4,7),
col2=list(3,2,6,8),
col3=list(1,4,5,3,7)
)
n.ids <- sapply(Test,length)
vals <- unlist(Test)
out <- sparseMatrix(vals, rep(seq_along(n.ids), n.ids))
The result is
> out
8 x 3 sparse Matrix of class "ngCMatrix"
[1,] . . |
[2,] | | .
[3,] . | |
[4,] | . |
[5,] . . |
[6,] . | .
[7,] | . |
[8,] . | .
Using Joris' example, here's a syntactically simple way using sapply/replace
. I suspect Joris' approach is faster, because it fills in a pre-allocated matrix, whereas my approach implicitly involves cbind
ing a bunch of columns, and so would require repeated memory allocations for the columns (is that true?).
Test <- list(
col1=list(2,4,7),
col2=list(3,2,6,8),
col3=list(1,4,5,3,7)
)
> z <- rep(0, max(unlist(Test)))
> sapply( Test, function(x) replace(z,unlist(x),1))
col1 col2 col3
[1,] 0 0 1
[2,] 1 1 0
[3,] 0 1 1
[4,] 1 0 1
[5,] 0 0 1
[6,] 0 1 0
[7,] 1 0 1
[8,] 0 1 0
Here is some sample data that seems to fit your description.
a <- as.list(sample(20, 5))
b <- as.list(sample(20, 5))
c <- as.list(sample(20, 5))
abc <- list(a = a, b = b, c = c)
I do not see a way to do this with nested lapply()
but here is another way. It would be nice to eliminate the unlist()
, but maybe someone else can improve on this.
sp_to_bin <- function(splist) {
binlist <- numeric(100)
binlist[unlist(splist)] <- 1
return(binlist)
}
bindf <- data.frame(lapply(abc, sp_to_bin))
To build on Joris's answer, which used a scalar index vector to fill in the output matrix, you can also use a matrix index vector to fill in the output matrix; this can sometimes be a little clearer to write or understand later.
Test <- list(
col1=list(2,4,7),
col2=list(3,2,6,8),
col3=list(1,4,5,3,7)
)
n.cols <- length(Test)
n.ids <- sapply(Test,length)
vals <- unlist(Test)
n.rows <- max(vals)
idx <- cbind(vals, rep(seq_along(n.ids), n.ids))
out <- matrix(0,nrow=n.rows,ncol=n.cols)
out[idx] <- 1
colnames(out) <- names(Test)
The result is the same.
精彩评论