开发者

r combining results from tapply using do.call(rbind) or ddply

开发者 https://www.devze.com 2023-03-24 21:39 出处:网络
I have Date/Time information where I want to get the average, min, max, range of the dates across \"seasons\" grouped by years and the only way I have been slightly successful of doing this is with ta

I have Date/Time information where I want to get the average, min, max, range of the dates across "seasons" grouped by years and the only way I have been slightly successful of doing this is with tapply. The closest is the summary function in tapply. Results are what I only assume is a list by year of the min, max, 1st Qu, median etc. These are not exactly what i need but works fine.

Example of results I get from the tapply function from 1 season:

$`2003`
                     Min.                   1st Qu.                    Median                      Mean                   3rd Qu.                      Max. 
"2003-04-22 00:00:00 UTC" "2003-05-03 00:00:00 UTC" "2003-05-12 00:00:00 UTC" "2003-05-10 02:00:00 UTC" "2003-05-18 00:00:00 UTC" "2003-05-21 00:00:00 UTC" 

$`2004`
                     Min.                   1st Qu.                    Median                      Mean                   3rd Qu.                      Max. 
"2004-04-07 00:00:00 UTC" "2004-04-13 00:00:00 UTC" "2004-05-10 00:00:00 UTC" "2004-05-01 07:08:56 UTC" "2004-05-11 12:00:00 UTC" "2004-05-20 00:00:00 UTC" 

What I want to do is combine these lists I generate for different seasons into one big happy dataframe that I can export into a csv. I have searched and searched and am getting even more confused. Most people suggest the simplest is

test = do.call(rbind, sSM, eSM, sC) #note here sSM, eSM, sC are my "seasons"

However this gives me an error message or an empty "test" frame.

Ive read about the plyr package and assume that ddply instead of the tapply to force the result into a dataframe should work, but I can't figure out how to get that to even work because I can't get it to do the functions I would need such as the mean, min, max, & range...

The end result I would like is something like:

Season   Year   Min           Max      Mean        Median  
sSM     2003   2003-04-21   2003-5-1  2003-4-25   2003-4-23
eSM     2003...
sSM     2004...
eSM     2004...

THEN I would like to do the exact same, only determine within the Individual across years. I have an ID field in which in some years the same individual has been measured. I would like to get a mean start date for that individual across years within each season. Leaving out those individuals who were not measured over multiple years.

As I do not understand how to manipulate lists and arrays and even what tapply is doing I am at a loss as to how to fix all of this. Do I create an empty dataframe and put all this stuff into it in some loop or something? Below is some sample data from 4 "seasons" across a couple of years notice some dates have the date and time stamps and some just have the date.

    structure(list(Year = c(2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 
2003L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2005L, 2005L, 
2005L, 2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L), ID = structure(c(11L, 
12L, 13L, 14L, 15L, 16L, 17L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 14L, 15L, 1L, 2L, 3L, 6L), .Label = c("c_002", "c_102", 
"c_104", "c_105", "c_109", "c_401", "c_9814", "c_9815", "c_9816", 
"c_9819", "c_9901", "c_9902", "c_9905", "c_9908", "c_9911", "c_9912", 
"c_9916"), class = "factor"), sSM = structure(c(10L, 10L, 9L, 
17L, 8L, 8L, 7L, 18L, NA, 1L, NA, 15L, 13L, 12L, 6L, 3L, 5L, 
2L, 4L, 16L, 14L, 11L, 11L), .Label = c("04/07/2004 15:00", "04/23/2005 10:01", 
"04/25/2005 03:01", "04/27/2005 02:00", "04/27/2005 08:00", "04/29/2005 04:00", 
"05/01/2003", "05/03/2003", "05/04/2003", "05/05/2003", "05/05/2006", 
"05/07/2005 16:01", "05/11/2004 11:00", "05/11/2006", "05/13/2004 08:00", 
"05/14/2006", "05/17/2003", "05/17/2004 12:02"), class = "factor"), 
    eSM = structure(c(13L, 18L, 15L, 21L, 16L, 10L, 14L, 20L, 
    NA, 1L, NA, 11L, 7L, 5L, 6L, 17L, 3L, 2L, 4L, 19L, 9L, 12L, 
    8L), .Label = c("04/27/2004 05:00", "05/01/2005 05:00", "05/06/2005 05:00", 
    "05/08/2005 07:01", "05/12/2005 21:00", "05/15/2005 19:00", 
    "05/18/2004 13:00", "05/18/2006", "05/20/2006", "05/21/2003", 
    "05/21/2004 01:01", "05/23/2006", "05/24/2003", "05/25/2003", 
    "05/26/2003", "05/27/2003", "05/27/2005 01:00", "05/28/2003", 
    "05/28/2006", "06/01/2004 02:01", "06/03/2003"), class = "factor"), 
    sC = structure(c(9L, 12L, 16L, 19L, 18L, 12L, 7L, 13L, NA, 
    10L, NA, 20L, 4L, 14L, 11L, 5L, 1L, 2L, 3L, 8L, 17L, 6L, 
    15L), .Label = c("05/26/2005 00:00", "05/26/2005 10:00", 
    "05/27/2005 06:01", "05/28/2004 08:00", "05/29/2005 23:01", 
    "05/29/2006", "05/30/2003", "05/30/2006", "05/31/2003", "05/31/2004 06:01", 
    "05/31/2005 15:00", "06/01/2003", "06/01/2004 07:02", "06/01/2005 16:00", 
    "06/01/2006", "06/03/2003", "06/03/2006", "06/04/2003", "06/05/2003", 
    "06/05/2004 01:00"), class = "factor"), eC = structure(c(11L, 
    2L, 15L, 6L, 17L, 17L, 2L, 8L, NA, 7L, NA, 13L, 5L, 9L, 9L, 
    3L, 10L, 1L, 4L, 12L, 14L, 16L, 12L), .Label = c("06/03/2005 18:00", 
    "06/04/2003", "06/04/2005 04:01", "06/05/2005 05:01", "06/06/2004 22:00", 
    "06/07/2003", "06/07/2004 23:00", "06/08/2004 19:01", "06/08/2005 03:00", 
    "06/10/2005 20:00", "06/12/2003", "06/13/2006", "06/14/2004 00:00", 
    "06/14/2006", "06/16/2003", "06/18/2006", "06/19/2003"), class = "factor")), .Names = c("Year", 
"ID", "sSM", "eSM", "sC", "eC"), class = "data.frame", row.names = c(NA, 
-23L))

Here is some code Ive written thus far:

dates$StartSM =  as.POSIXct(strptime(dates$sSM,"%m/%d/%Y",tz="UTC"),tz="UTC")                 
dates$EndSM =  as.POSIXct(strptime(dates$eSM,"%m/%d/%Y",tz="UTC"),tz="UTC")
dates$EndC =  as.POSIXct(strptime(dates$eC,"%m/%d/%Y %H:%M",tz="UTC"),tz="UTC")
dates$StartC =  as.POSIXct(strptime(dates$sC,"%m/%d/%Y %H:%M",tz="UTC"),tz="UTC")
sSpringM = tapply(开发者_Python百科dates$StartSM, dates$Year, summary)
eSpringM  = tapply(dates$EndSM, dates$Year, summary)
sCalving = as.vector(tapply(dates$StartC, dates$Year, summary))
eCalving = tapply(dates$EndC, dates$Year, summary)
datadates = do.call(rbind, sSpringM, eSpringM, sCalving)


You have (at the top anyway) a list, call it "yourlist". Try this to print them:

lapply(yourlist, format, "%Y-%m-%d")

If you want them in a matrix:

do.call(rbind, sapply(yourlist, unlist))

I don't see any "winter", "spring", etc in your structure. I have in the past used the data.table package's year, quarter, month and week functions to return a vector of numeric values for years, quarters, months and weeks. It's easy to plot with the integer years plus appropriate fractional additions like year(.)+quarter/4. Easier than the corresponding cut and label hassles that would otherwise result.

0

精彩评论

暂无评论...
验证码 换一张
取 消