Suppose I have a named vector, bar
:
bar=c()
bar["1997-10-14"]=1
bar["2001-10-14"]=2
bar["2007-10-14"]=1
How can I select from bar
all values for which the index is within a specific date range? So, if I look for all values between "1995-01-01"
and "2000-06-01"
, I should get 1
. And similarly for the开发者_运维百科 period between "2001-09-01"
and "2007-11-04"
, I should get 2
and 1
.
This problem has been solved for good with the xts package which extends functionality from the zoo package.
R> library(xts)
Loading required package: zoo
R> bar <- xts(1:3, order.by=as.Date("2001-01-01")+365*0:2)
R> bar
[,1]
2001-01-01 1
2002-01-01 2
2003-01-01 3
R> bar["2002::"] ## open range with a start year
[,1]
2002-01-01 2
2003-01-01 3
R> bar["::2002"] ## or end year
[,1]
2001-01-01 1
2002-01-01 2
R> bar["2002-01-01"] ## or hits a particular date
[,1]
2002-01-01 2
R>
There is a lot more here -- but the basic point is do not operate on strings masquerading as dates.
Use a Date
type, or preferably even an extension package built to efficiently index on millions of dates.
You need to convert your dates from characters into a Date
type with as.Date()
(or a POSIX type if you have more information like the time of day). Then you can make comparisons with standard relational operators such as <= and >=.
You should consider using a timeseries package such as zoo
for this.
Edit:
Just to respond to your comment, here's an example of using dates with your existing vector:
> as.Date(names(bar)) < as.Date("2001-10-14")
[1] TRUE FALSE FALSE
> bar[as.Date(names(bar)) < as.Date("2001-10-14")]
1997-10-14
1
Although you really should just use a time series package. Here's how you could do this with zoo
(or xts
, timeSeries
, fts
, etc.):
library(zoo)
ts <- zoo(c(1, 2, 1), as.Date(c("1997-10-14", "2001-10-14", "2007-10-14")))
ts[index(ts) < as.Date("2001-10-14"),]
Since the index is now a Date
type, you can make as many comparisons as you want. Read the zoo
vignette for more information.
Using fact that dates are in lexical order:
bar[names(bar) > "1995-01-01" & names(bar) < "2000-06-01"]
# 1997-10-14
# 1
bar[names(bar) > "2001-09-01" & names(bar) < "2007-11-04"]
# 2001-10-14 2007-10-14
# 2 1
Result is named vector (as you original bar
, it's not a list it's named vector).
As Dirk states in his answer it's better to use Date
for efficiency reasons. Without external packages you could rearrange you data and create two vectors (or two-column data.frame
) one for dates, one for values:
bar_dates <- as.Date(c("1997-10-14", "2001-10-14", "2007-10-14"))
bar_values <- c(1,2,1)
then use simple indexing:
bar_values[bar_dates > as.Date("1995-01-01") & bar_dates < as.Date("2000-06-01")]
# [1] 1
bar_values[bar_dates > as.Date("2001-09-01") & bar_dates < as.Date("2007-11-04")]
# [1] 2 1
精彩评论