I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.
What I do quite often with plyr is to split my data by a column containing dates and do some calculations:
library(plyr)
DF <- data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})
However, using a column with the Date-format does not seem to work in data.table:
library(data.table)
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, D开发者_开发百科ate) : Column 'Date' cannot be auto converted to integer without losing information.
If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?
sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.3 zoo_1.7-2 lubridate_0.2.5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4
[7] reshape2_1.1 xtable_1.5-6 plyr_1.5.2
loaded via a namespace (and not attached):
[1] digest_0.5.0 lattice_0.19-30 stringr_0.5 tools_2.13.1
This should work:
DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
The data.table package contains some date/time classes with integer storage mode.
See ?IDateTime:
Date and time classes with integer storage for fast sorting and grouping. Still experimental!
IDateis a date class derived fromDate. It has the same internal representation as theDateclass, except the storage mode is integer.ITimeis a time-of-day class stored as the integer number of seconds in the day.as.ITimedoes not allow days longer than 24 hours. BecauseITimeis stored in seconds, you can add it to aPOSIXctobject, but you should not add it to aDateobject.IDateTimetakes a date-time input and returns a data table with columnsdateandtime.
加载中,请稍侯......
精彩评论