I have a data table in R:
name d开发者_如何学JAVAate
---- ----
John 1156649280
Adam 1255701960
...etc...
I want to get all of the rows that have a date within a range. In SQL, I might say SELECT * FROM mytable WHERE date > 5 AND date < 15
What is the equivalent in R, to select rows based on the range of values in a particular column?
Construct some data
df <- data.frame( name=c("John", "Adam"), date=c(3, 5) )
Extract exact matches:
subset(df, date==3)
name date
1 John 3
Extract matches in range:
subset(df, date>4 & date<6)
name date
2 Adam 5
The following syntax produces identical results:
df[df$date>4 & df$date<6, ]
name date
2 Adam 5
Lots of options here, but one of the easiest to follow is subset
. Consider:
> set.seed(43)
> df <- data.frame(name = sample(letters, 100, TRUE), date = sample(1:500, 100, TRUE))
>
> subset(df, date > 5 & date < 15)
name date
11 k 10
67 y 12
86 e 8
You can also insert logic directly into the index for your data.frame. The comma separates the rows from columns. We just have to remember that R indexes rows first, then columns. So here we are saying rows with date > 5 & < 15 and then all columns:
df[df$date > 5 & df$date < 15 ,]
I'd also recommend checking out the help pages for subset, ?subset
and the logical operators ?"&"
One should also consider another intuitive way to do this using filter()
from dplyr
. Here are some examples:
set.seed(123)
df <- data.frame(name = sample(letters, 100, TRUE),
date = sample(1:500, 100, TRUE))
library(dplyr)
filter(df, date < 50) # date less than 50
filter(df, date %in% 50:100) # date between 50 and 100
filter(df, date %in% 1:50 & name == "r") # date between 1 and 50 AND name is "r"
filter(df, date %in% 1:50 | name == "r") # date between 1 and 50 OR name is "r"
# You can also use the pipe (%>%) operator
df %>% filter(date %in% 1:50 | name == "r")
精彩评论