I have a large dataframe with classification information. Here is an example:
> d <- data.frame(x = c(1,2,3,4), classification = c("cl1.scl1", "cl2", "cl3-bla", "cl4.subclass2"))
> d
x classification
1 1 cl1.scl1
2 2 cl2
3 3 cl3-bla
4 4 cl4.subclass2
Before I do any further processing I need to aggregate the classification information, which means that I have to split the classification strings by "." and take the first token. This is the result I need:
> d
x classification
1 1 cl1
2 2 cl2
3 3 cl3-bla
4 4 cl4
At the moment I am computing this as follows:
d$classification = unlist(lapply(d$classification, function (x) strsplit(as.ch开发者_JS百科aracter(x), ".", fixed=TRUE)[[1]][1]))
This works, but it took me quite a while to figure this out. I assume there is a more elegant solution, which I probably missed. Any suggestions? Thanks!
A slightly shorter solution is
sapply(strsplit(as.character(d$class), "\\."), `[`, 1)
You can use regular expressions with back-references.
gsub("(.*)\\.(.*)","\\1",d$classification)
There are 2 references (the portions of the regular expression in parenthesis), separated by a literal period. We replace whatever matches that pattern with the contents of the first reference.
Just delete the stuff that follows the "."
> sub("\\..+$", "", d$class)
[1] "cl1" "cl2" "cl3-bla" "cl4"
d$classification <- sub("\\..+$", "", d$classification)
# I've never been very comfortable with partial name matching.
精彩评论