开发者

Can I use Lists in R as a proxy to data frame having unequal number of columns?

开发者 https://www.devze.com 2023-02-12 10:59 出处:网络
My understanding as far开发者_JAVA技巧 as data frame in R is that it has to be rectangular.It is not possible to have a data frame with unequal column lengths.Can I use the lists in R to achieve this?

My understanding as far开发者_JAVA技巧 as data frame in R is that it has to be rectangular. It is not possible to have a data frame with unequal column lengths. Can I use the lists in R to achieve this? What are he pros and cons for such an approach?


You can use lists to store whatever you want, even dataframes or other lists! You can indeed assign different length vectors, or even completely different objects. It gives you the same functionality as dataframes in that you can index using the dollar sign:

> fooList <- list(a=1:12, b=1:11, c=1:10)
> fooList$a
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
> fooDF <- data.frame(a=1:10, b=1:10, c=1:10)
> fooDF$a
 [1]  1  2  3  4  5  6  7  8  9 10

But numeric indexing is different:

> fooList[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
> fooDF[,1]
 [1]  1  2  3  4  5  6  7  8  9 10

as well as the structure and printing method:

> fooList


$a
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

$b
 [1]  1  2  3  4  5  6  7  8  9 10 11

$c
 [1]  1  2  3  4  5  6  7  8  9 10

> fooDF
    a  b  c
1   1  1  1
2   2  2  2
3   3  3  3
4   4  4  4
5   5  5  5
6   6  6  6
7   7  7  7
8   8  8  8
9   9  9  9
10 10 10 10

Simply said a dataframe is a matrix and a list more of a container.

A list is meant to keep all sorts of stuff together, and a dataframe is the usual data format (a subject/case for each row and a variable for each column). It is used in a lot of analyses, allows to index the scores of a subject, can be more easilly transformed and other things.

However if you have unequal length columns then I doubt each row resembles a subject/case in your data. In that case I guess you don't need much of the functionality of dataframes.

If each row does resemble a subject/case, then you should use NA for any missing values and use a data frame.

0

精彩评论

暂无评论...
验证码 换一张
取 消