Let's say I have the following data frame.
dat <- data.frame(city=c("Chelsea","Brent","Bremen","Olathe","Lenexa","Shawnee"),
tag=c(rep("AlabamaCity",3), rep("KansasCity",3)))
I want to include a third column, Tag2, which will be the region that each state is in from the Tag column. So the first three cities will end up as 'South' and the last three will be 'Midwest'. The data will look like.
city tag tag2
1 Chelsea AlabamaCity South
2 Brent AlabamaCity South
3 Bremen AlabamaCity South
4 Olathe KansasCity Midwest
5 Lenexa KansasCity Midwest
6 Shawnee KansasCity Midwest
I tried the following commands, but it doesn't create a new column. Can anyone tell me what's wrong.
fixit <- function(dat) {
for (i in 1:nrow(dat)) {
Words = strsplit(as.character(dat[开发者_如何转开发i, 'tag']), " ")[[1]]
if(any(Words == 'Alabama')) {
dat[i, 'tag2'] <- "South"
}
if(any(Words == 'Kansas')) {
dat[i, 'tag2'] <- "Midwest"
}
}
return(dat)
}
Thanks for the help.
It isn't working because your strsplit()
to create Words
is wrong. (You do know how to debug R function's don't you?)
debug: Words = strsplit(as.character(dat[i, "tag"]), " ")[[1]]
Browse[2]>
debug: if (any(Words == "Alabama")) {
dat[i, "Tag2"] <- "South"
}
Browse[2]> Words
[1] "AlabamaCity"
at this point, Words
is certainly not equal to "Alabama"
or "Kansas"
and will never be, so the if()
clauses never get executed. R is returning dat
, it is your function that is not altering dat
.
This will do it for you, and is a bit more generic. First create a data frame holding the matched words with the regions
region <- data.frame(tag = c("Alabama","Kansas"), tag2 = c("South","Midwest"),
stringsAsFactors = FALSE)
The loop over the rows of this data frame, matching the "tag"
s and inserting the appropriate "tag2"
s:
for(i in seq_len(nrow(region))) {
want <- grepl(region[i, "tag"], dat[, "tag"])
dat[want, "tag2"] <- region[i, "tag2"]
}
Which will result in this:
> dat
city tag tag2
1 Chelsea AlabamaCity South
2 Brent AlabamaCity South
3 Bremen AlabamaCity South
4 Olathe KansasCity Midwest
5 Lenexa KansasCity Midwest
6 Shawnee KansasCity Midwest
How does this work? The key bit is grepl()
. If we do this for just one match, "Alabama"
, grepl()
is used like this:
grepl("Alabama", dat[, "tag"])
and returns a logical indicating which of the "tag"
elements matched the string "Alabama":
> grepl("Alabama", dat[, "tag"])
[1] TRUE TRUE TRUE FALSE FALSE FALSE
精彩评论