开发者

Problems splitting data frame into a nested list

开发者 https://www.devze.com 2023-04-01 00:42 出处:网络
I am a newbie to R and I have problem splitting a very large data frame into a nested list. I tried to look for help on the internet, but I was unsuccessful.

I am a newbie to R and I have problem splitting a very large data frame into a nested list. I tried to look for help on the internet, but I was unsuccessful.

I have a simplified example on how my data are organized:

The headers are:

1 "station" (number)
2. "date.str" (date string)
3. "member"
4. "forecast time"
5. "data"

I am not sure my data example will show up rightly, but if so, it look like this:

1. station date.str member forecast.time data1
2. 6019 20110805 mbr000 06 77
3. 6031 20110805 mbr000 06 28
4. 6071 20110805 mbr000 06 45
5. 6019 20110805 mbr001 12 22
6. 6019 20110806 mbr024 18 66

I want to split the large data frame into a nested list after "station", "member", "date.str" and "forecast.time". So that mylist[[c(s,m,d,t)]] contains a data frame with data for station "s" and member "m" for date.str "d" and for forecast time "t" conserving the values of s, m, d and t.

My code is:

data.st <- list()
data.st.member <- list()
data.st.member.dato <- list()

data.st. <- split(mydata, mydata$station)
data.st.member <- lapply(data.st, FUN = fsplit.member)

(I created a function to split after "member")

#Loop over station number:
for (s in 1:S){

#Loop over members:
for (m in 1:length(members){
tmp <- split( data.st.member[[s]][[m]], data.st.member[[s]][[m]]$dato.str )

#Loop over number of diffe开发者_StackOverflow中文版rent "date.str"s
for (t in 1:length(no.date.str) ){
data.st.member.dato[[s]][[m]][[t]] <- tmp}
} #end m loop
} #end s loop

I would also like to split according to the forecast time: forec.time, but I didn't get that far.

I have tried a couple of different configurations within the loops, so I don't at the moment have a consistent error message. I can't figure out, what I am doing or thinking wrong.

Any help is much appreciated!

Regards Sisse


It's easier than you think. You can pass a list into split in order to split on several factors.

Reproducible example

with(airquality, split(airquality, list(Month, Day)))

With your data

data.st <- with(mydata, 
  split(mydata, list("station", "member", "date.str", "forecast.time"))
)

Note: This doesn't give you a nested list like you asked for, but as Joran commented, you very probably don't want that. A flat list will be nicer to work with.

Speculating wildly: did you just want to calculate statistics on different chunks of data? If so, then see the many questions here on split-apply-combine problems.


I also want to echo the others in that this recursive data structure is going to be difficult to work with and probably there are better ways. Do look at the split-apply-combine approach as Richie suggested. However, the constraints may be external, so here is an answer using the plyr library.

mylist <- dlply(mydata, .(station), dlply, .(memeber), dlply, .(date.str), dlply, .(forecast.time), identity)

Using the snippet of data you gave for mydata,

> mylist[[c("6019","mbr000","20110805","6")]]
  station date.str member forecast.time data1
1    6019 20110805 mbr000             6    77
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号