开发者

Fastest way to get class vector from names in R

开发者 https://www.devze.com 2023-01-30 14:32 出处:网络
If I\'m having the following vector in R (my levels obviously being A, B, and C) c(\"A_1\", \"A_2\", \"B_1\", \"C_1\", \"C_2\")

If I'm having the following vector in R (my levels obviously being A, B, and C)

c("A_1", "A_2", "B_1", "C_1", "C_2")

what is the most efficient way to transform it to class vector with numbers like

c(1, 1, 2, 3, 3)

I feel like this should be a one-liner (likely a combination of factor and grep) bu开发者_运维技巧t was unable to come up with one.

Thanks!


A simple solution would be:

x <- c("A_1", "A_2", "B_1", "C_1", "C_2")


x.out <- as.numeric(factor(substr(x, 0,1)))

If your data is more varied, let me know and we can work to make it a more robust solution.


There's a (more general) regular expression approach that would not require specifying the width of leading string:

Either delete anything incuding and after the underscore:

> as.numeric(factor(sub("_.+", "" , x)))
[1] 1 1 2 3 3

Or select the characters that precede the underscore (since in the R regex portions of the patterns enclosed in parens can be referred to in the replacement string by "\\" followed by a digit):

> as.numeric(factor(sub("(^.+)_.+$", "\\1" , x)))
[1] 1 1 2 3 3
0

精彩评论

暂无评论...
验证码 换一张
取 消