开发者

how to rank values in a vector and give them corresponding values?

开发者 https://www.devze.com 2023-01-29 23:43 出处:网络
Now I\'m doing it by looping trhough a sorted vector, but maybe there is a faster way using internal R functions, and maybe I don\'t even need to sort.

Now I'm doing it by looping trhough a sorted vector, but maybe there is a faster way using internal R functions, and maybe I don't even need to sort.

vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
vect = sort(vect)
print(vect)
outvect = mat.or.vec(length(vect),1)
outvect[1] = counter = 1
for(i in 2:leng开发者_C百科th(vect)) {
    if (vect[i] != vect[i-1]) { counter = counter + 1 }
    outvect[i] = counter
}

    print(cbind(vect,outvect))

 vect outvect
 [1,]    1       1
 [2,]    1       1
 [3,]    2       2
 [4,]    2       2
 [5,]    3       3
 [6,]    3       3
 [7,]    4       4
 [8,]    4       4
 [9,]    5       5
[10,]    6       6
[11,]   10       7
[12,]   12       8
[13,]   13       9
[14,]   15      10
[15,]   33      11
[16,]   41      12
[17,]   42      13

The code is used to make charts with integers on the X axis instead of real data because for me distance between the X values is not important. So in my case the smallest x value is always 1. and the largest is always equal to how many X values are there.

-- edit: due to some misuderstanding about my question I added self sufficient code with output.


That's more clear. Hence :

> vect = c(41,42,5,6,3,12,10,15,2,3,4,13,2,33,4,1,1)
> cbind(vect,as.numeric(factor(vect)))
 [1,]   41 12
 [2,]   42 13
 [3,]    5  5
 [4,]    6  6
 [5,]    3  3
 [6,]   12  8
 [7,]   10  7
 [8,]   15 10
 [9,]    2  2
[10,]    3  3
[11,]    4  4
[12,]   13  9
[13,]    2  2
[14,]   33 11
[15,]    4  4
[16,]    1  1
[17,]    1  1

No sort needed. And as said, see also ?factor

and if you want to preserve the order, then:

> cbind(vect,as.numeric(factor(vect,levels=unique(vect))))
      vect   
 [1,]   41  1
 [2,]   42  2
 [3,]    5  3
 [4,]    6  4
 [5,]    3  5
 [6,]   12  6
 [7,]   10  7
 [8,]   15  8
 [9,]    2  9
[10,]    3  5
[11,]    4 10
[12,]   13 11
[13,]    2  9
[14,]   33 12
[15,]    4 10
[16,]    1 13
[17,]    1 13


Joris solution is right on, but if you have a long vectors, it is a bit (3x) more efficient to use match and unique:

> x=sample(1e5, 1e6, replace=TRUE)
> # preserve order:
> system.time( a<-cbind(x, match(x, unique(x))) )
   user  system elapsed 
   0.20    0.00    0.22 
> system.time( b<-cbind(x, as.numeric(factor(x,levels=unique(x)))) )
   user  system elapsed 
   0.70    0.00    0.72 
> all.equal(a,b)
[1] TRUE
> 
> # sorted solution:
> system.time( a<-cbind(x, match(x, sort(unique(x)))) )
   user  system elapsed 
   0.25    0.00    0.25 
> system.time( b<-cbind(x, as.numeric(factor(x))) )
   user  system elapsed 
   0.72    0.00    0.72 
> all.equal(a,b)
[1] TRUE


You can try this : (Note that you may want a different behaviour for repeated values. This will give each value a unique rank)

> x <- sample(size=10, replace=T, x=1:100)
> x1 <- vector(length=length(x))
> x1[order(x)] <- 1:length(x)
> cbind(x, x1)
       x x1
 [1,] 40  1
 [2,] 46  4
 [3,] 43  3
 [4,] 41  2
 [5,] 47  5
 [6,] 84 10
 [7,] 75  8
 [8,] 60  7
 [9,] 59  6
[10,] 80  9


It looks like you are counting runs in the data, if that is the case, look at the rle function.


You apparently want the results of something like table() but lined up next to the values: Try using the ave() function:

csvdata$counts <- ave(csvdata[, "X"], factor(csvdata[["X"]]), FUN=length)

The trick here is that the syntax of ave is a bit different than tapply because you put in an arbitrarily long set of factor arrguments and you need to put in the FUN= in front of the function because the arguments after triple dots are not process by order. They need to be named.

0

精彩评论

暂无评论...
验证码 换一张
取 消