I have a nested list in say lst
(all the elements are of class int
). I don't know the length of lst
in advance; however I do know that each element of lst
is a list of length say k
length(lst[[i]]) # this equals k and is known in advance,
# this is true for i = 1 ... length(lst)
How do I take the union
of the 1st element, 2nd element, ..., kth element of all the elements of lst
Specifically, if the length of lst
is n
, I want (not R code):
# I know that union can only be taken for 2 elements,
# following开发者_开发百科 is for illustration purposes
listUnion1 <- union(lst[[1, 1]], lst[[2, 1]], ..., lst[[n, 1]])
listUnion2 <- union(lst[[1, 2]], lst[[2, 2]], ..., lst[[n, 2]])
.
.
.
listUnionk <- union(lst[[1, k]], lst[[2, k]], ..., lst[[n, k]])
Any help or pointers are greatly appreciated.
Here is a dataset that can be used, n = 3 and k = 2
list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")),
structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")),
structure(list(a = 12, b = 12), .Names = c("a", "b")))
Here is a general solution, similar in spirit to that of @Ramnath, but avoiding the use of union()
which is a binary function. The trick is to note that union()
is implemented as:
unique(c(as.vector(x), as.vector(y)))
and the bit inside unique()
can be achieved by unlisting the n
th component of each list.
The full solution then is:
unionFun <- function(n, obj) {
unique(unlist(lapply(obj, `[[`, n)))
}
lapply(seq_along(lst[[1]]), FUN = unionFun, obj = lst)
which gives:
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12
[[2]]
[1] 6 7 8 9 10 11 1 2 3 4 5 12
on the data you showed.
A couple of useful features of this are:
- we use
`[[`
to subsetobj
inunionFun
. This is similar tofunction(x) x$a
in @Ramnath's Answer. However, we don't need an anonymous function (we use`[[`
instead). The equivalent to @Ramnath's Answer is:lapply(lst, `[[`, 1)
- to generalise the above, we replace the
1
above withn
inunionFun()
, and allow our list to be passed in as argumentobj
.
Now that we have a function that will provide the union of the n
th elements of a given list, we can lapply()
over the indices k
, applying our unionFun()
to each sub-element of lst
, using the fact that the length of lst[[1]]
is the same as length(lst[[k]])
for all k
.
If it helps to have the names of the n
th elements in the returned object, we can do:
> unions <- lapply(seq_along(lst[[1]]), FUN = unionFun, obj = lst)
> names(unions) <- names(lst[[1]])
> unions
$a
[1] 1 2 3 4 5 6 7 8 9 10 11 12
$b
[1] 6 7 8 9 10 11 1 2 3 4 5 12
Here is one solution
# generate dummy data
x1 = sample(letters[1:5], 20, replace = T)
x2 = sample(letters[1:5], 20, replace = T)
df = data.frame(x1, x2, stringsAsFactors = F)
# find unique elements in each column
union_df = apply(df, 2, unique)
Let me know if this works
EDIT: Here is a solution for lists using the data you provided
mylist = list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")),
structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")),
structure(list(a = 12, b = 12), .Names = c("a", "b")))
list_a = lapply(mylist, function(x) x$a)
list_b = lapply(mylist, function(x) x$b)
union_a = Reduce(union, list_a)
union_b = Reduce(union, list_b)
If you have more than 2 elements in your list, we could generalize this code.
Here's another way: Use do.call/rbind
to line up the lists by "name" into a data-frame, then apply
unique/do.call
to each column of this data-frame. ( I modified your data slightly so the 'a' and 'b' unions are of different lengths, to make sure it works correctly).
lst <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")),
structure(list(a = 6:10, b = 1:5), .Names = c("a", "b")),
structure(list(a = 12, b = 12), .Names = c("a", "b")))
> apply(do.call(rbind, lst),2, function( x ) unique( do.call( c, x)))
$a
[1] 1 2 3 4 5 6 7 8 9 10 12
$b
[1] 6 7 8 9 10 11 1 2 3 4 5 12
Your data
df <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")),
structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")),
structure(list(a = 12, b = 12), .Names = c("a", "b")))
This gives you the unique values of the nested lists:
library(plyr)
df.l <- llply(df, function(x) unlist(unique(x)))
R> df.l
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11
[[2]]
[1] 6 7 8 9 10 11 1 2 3 4 5
[[3]]
[1] 12
EDIT
Thanks to Ramnath I changed the code a bit and hope this answer fits the needs of your question. For illustration I keep the previous answer as well. The slightly changed data has now an additional list.
df <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")),
structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")),
structure(list(a = 12, b = 12, c = 10:14), .Names = c("a", "b", "c")))
f.x <- function(x.list) {
x.names <- names(x.list)
i <- combn(x.names, 2)
l <- apply(i, 2, function(y) x.list[y])
llply(l, unlist)
}
Now you can apply the function to your data.
all.l <- llply(df, f.x)
llply(all.l, function(x) llply(x, unique))
R> [[1]]
[[1]][[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11
[[2]]
[[2]][[1]]
[1] 6 7 8 9 10 11 1 2 3 4 5
[[3]]
[[3]][[1]]
[1] 12
[[3]][[2]]
[1] 12 10 11 13 14
[[3]][[3]]
[1] 12 10 11 13 14
However, the nested structure is not very user friendly. That could be changed a bit...
According to the documentation "unlist" is a recursive function, hence regardless of the nesting level of the lists supplied you can get all elements by passing them to unlist. You can get the union of the sublists as follows.
lst <- list(structure(list(a = 1:5, b = 6:11), .Names = c("a", "b")),
structure(list(a = 6:11, b = 1:5), .Names = c("a", "b")),
structure(list(a = 12, b = 12), .Names = c("a", "b")))
lapply(lst, function(sublst) unique(unlist(sublst)))
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11
[[2]]
[1] 6 7 8 9 10 11 1 2 3 4 5
[[3]]
[1] 12
精彩评论