In a database I have data with associated timestamps. The timestamp is random in nature and has resolution upto minutes. I 开发者_JAVA百科want to make this data uniform using R with respect to timestamps (with seconds resolution) with NA replaced by the previous value. Also, every timestamp should contain data for all the symbols. I have tried some timeseries packages for making the data uniform but have not been succcessful.
This is the code I have run so far
library("RPostgreSQL")
library(DBI)
library(sqldf)
drv <- dbDriver("PostgreSQL")
ch <- dbConnect(drv, dbname="derivativesData",
user="postgres", password="postgres")
companyFrame <- dbGetQuery(ch, "select * from derData")
companyFrame$trade_time
[1] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[3] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[5] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[7] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[9] "2011-06-01 09:00:00 IST" "2011-06-01 09:01:00 IST"
[11] "2011-06-01 09:01:00 IST" "2011-06-01 09:01:00 IST"
[13] "2011-06-01 09:02:00 IST" "2011-06-01 09:02:00 IST"
[15] "2011-06-01 09:02:00 IST" "2011-06-01 09:03:00 IST"
[17] "2011-06-01 09:04:00 IST" "2011-06-01 09:04:00 IST"
[19] "2011-06-01 09:05:00 IST" "2011-06-01 09:05:00 IST"
[21] "2011-06-01 09:06:00 IST" "2011-06-01 09:06:00 IST"
[23] "2011-06-01 09:06:00 IST" "2011-06-01 09:07:00 IST"
[25] "2011-06-01 09:08:00 IST" "2011-06-01 09:09:00 IST"
[27] "2011-06-01 09:10:00 IST" "2011-06-01 09:10:00 IST"
I want to convert this data into uniform format with say 10secs resolution.
Here I will use a 10 minutes resolution as your times don't have any seconds...
With the following sample data :
R> time <- c("2011-06-01 09:00:00 IST", "2011-06-01 09:00:00 IST", "2011-06-01 09:01:00 IST",
+ "2011-06-01 09:06:00 IST", "2011-06-01 09:10:00 IST", "2011-06-01 09:15:00 IST")
You can first convert the strings to a POSIXlt
date format :
R> time2 <- strptime(time, format="%Y-%m-%d %X")
R> time2
[1] "2011-06-01 09:00:00" "2011-06-01 09:00:00" "2011-06-01 09:01:00"
[4] "2011-06-01 09:06:00" "2011-06-01 09:10:00" "2011-06-01 09:15:00"
Then you could use the minute
function from the lubridate
package to alter the minute components of your date and round it to a 10 minutes resolution, for example :
R> library(lubridate)
R> minute(time2) <- minute(time2) %/% 10 * 10
R> time2
[1] "2011-06-01 09:00:00 CEST" "2011-06-01 09:00:00 CEST"
[3] "2011-06-01 09:00:00 CEST" "2011-06-01 09:00:00 CEST"
[5] "2011-06-01 09:10:00 CEST" "2011-06-01 09:10:00 CEST"
Try the data.table package and it's roll=TRUE feature. See ?data.table and the vignettes where it talks about fast last observation carried forward.
精彩评论