开发者

Generate entries in time series data

开发者 https://www.devze.com 2023-04-12 17:06 出处:网络
I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks!

I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks!

   trans_date       ammount
1  2004-12-01        2968.91
2  2005-04-01         500.62
3  2005-05-01         434.30
4  2005-06-01         549.15
5  2005-07-01         276.77
6  2005-09-01         548.64
7  2005-10-01         761.69
8  2005-11-01         636.77
9  2005-12-01        1517.58
10 2006-03-01         719.09
11 2006-04-01        1231.88
12 2006-05-01         580.46
13 2006-07-01        1468.43
14 2006-10-01         692.22
15 2006-11-01         505.81
16 2006-12-01        1589.70
17 2007-03-01        1559.82
18 2007-06-01         764.98
19 2007-07-01         964.77
20 2007-09-01         405.18
21 2007-11-01         112.42
22 2007-12-01        1134.08
23 2008-02-01         269.72
24 2008-03-01         208.96
25 2008-04-01         353.58
26 2008-05-01         756.00
27 2008-06-01         747.85
28 2008-07-01         781.62
29 2008-09-01         195.36
30 2008-10-01         424.24
31 2008-12-01         166.23
32 2009-02-01         237.11
开发者_如何学编程33 2009-04-01         110.94
34 2009-07-01         191.29
35 2009-11-01         153.42
36 2009-12-01         222.87
37 2010-09-01        1259.97
38 2010-11-01         375.61
39 2010-12-01         496.48
40 2011-02-01         360.07
41 2011-03-01         324.95
42 2011-04-01         566.93
43 2011-06-01         281.19
44 2011-08-01         428.04

'data.frame':   44 obs. of  2 variables:
 $ trans_date     : Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
 $ ammount: num  2969 501 434 549 277 ...


you can use seq.Date and merge:

> str(df)
'data.frame':   44 obs. of  2 variables:
 $ trans_date: Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
 $ ammount   : num  2969 501 434 549 277 ...
> mns <- data.frame(trans_date = seq.Date(min(df$trans_date), max(df$trans_date), by = "month"))
> df2 <- merge(mns, df, all = TRUE)
> df2$ammount <- ifelse(is.na(df2$ammount), 0, df2$ammount)
> head(df2)
  trans_date ammount
1 2004-12-01 2968.91
2 2005-01-01    0.00
3 2005-02-01    0.00
4 2005-03-01    0.00
5 2005-04-01  500.62
6 2005-05-01  434.30

and if you need months until current, use this:

mns <- data.frame(trans_date = seq.Date(min(df$trans_date), Sys.Date(), by = "month"))

note that it is sufficient to call simply seq instead of seq.Date if the parameters are Date class.


If you're using xts objects, you can use timeBasedSeq and merge.xts. Assuming your original data is in an object Data:

# create xts object:
#   no comma on the first subset (Data['ammount']) keeps column name;
#   as.Date needs a vector, so use comma (Data[,'trans_date'])
x <- xts(Data['ammount'],as.Date(Data[,'trans_date']))
# create a time-based vector from 2004-12-01 to 2011-08-01.  The "m" denotes
# monthly time-steps.  By default this returns a yearmon class.  Use
# retclass="Date" to return a Date vector.
d <- timeBasedSeq(paste(start(x),end(x),"m",sep="/"), retclass="Date")
# merge x with an "empty" xts object, xts(,d), filling with zeros
y <- merge(x,xts(,d),fill=0)
0

精彩评论

暂无评论...
验证码 换一张
取 消