If I'm having the following vector in R (my levels obviously being A, B, and C)
c("A_1", "A_2", "B_1", "C_1", "C_2")
what is the most efficient way to transform it to class vector with numbers like
c(1, 1, 2, 3, 3)
I feel like this should be a one-liner (likely a combination of factor and grep) bu开发者_运维技巧t was unable to come up with one.
Thanks!
A simple solution would be:
x <- c("A_1", "A_2", "B_1", "C_1", "C_2")
x.out <- as.numeric(factor(substr(x, 0,1)))
If your data is more varied, let me know and we can work to make it a more robust solution.
There's a (more general) regular expression approach that would not require specifying the width of leading string:
Either delete anything incuding and after the underscore:
> as.numeric(factor(sub("_.+", "" , x)))
[1] 1 1 2 3 3
Or select the characters that precede the underscore (since in the R regex portions of the patterns enclosed in parens can be referred to in the replacement string by "\\" followed by a digit):
> as.numeric(factor(sub("(^.+)_.+$", "\\1" , x)))
[1] 1 1 2 3 3
精彩评论