I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.
What I do quite often with plyr is to split my data by a column containing dates and do some calculations:
DF <- data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})
However, using a column with the Date-format does not seem to work in data.table:
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, D开发者_开发百科ate) : Column 'Date' cannot be auto converted to integer without losing information.
If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.3 zoo_1.7-2 lubridate_0.2.5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4
[7] reshape2_1.1 xtable_1.5-6 plyr_1.5.2
loaded via a namespace (and not attached):
[1] digest_0.5.0 lattice_0.19-30 stringr_0.5 tools_2.13.1
This should work:
DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
The data.table package contains some date/time classes with integer storage mode.
See ?IDateTime
Date and time classes with integer storage for fast sorting and grouping. Still experimental!
is a date class derived fromDate
. It has the same internal representation as theDate
class, except the storage mode is integer.ITime
is a time-of-day class stored as the integer number of seconds in the day.as.ITime
does not allow days longer than 24 hours. BecauseITime
is stored in seconds, you can add it to aPOSIXct
object, but you should not add it to aDate
takes a date-time input and returns a data table with columnsdate