I would like to cut a vector of values ranging 0-70 to x number of categories, and would like the upper limit of each category. So far, I have tried this using cut()
and am trying to extract the limits from levels.
I have a list of levels, from which I would like to extract the second number from each level. How can I extract the values between space and ] (which is the number I'm interested in)?
I have:
> levels(bins)
[1] "(-0.07,6.94]" "(6.94,14]" "(14,21]" "(21,28]" "(28,35]"
[6] "(35,42]" "(42,49]" "(49,56]" "(56,63.1]" "(63.1,70.1]"
and would like to get:
[1] 6.94 14 开发者_运维百科21 28 35 42 49 56 63.1 70.1
Or is there a better way of calculating the upper bounds of categories?
This could be one solution
k <- sub("^.*\\,","", levels(bins))
as.numeric(substr(k,1,nchar(k)-1))
gives
[1] 6.94 14.00 21.00 28.00 35.00 42.00 49.00 56.00 63.10 70.10
If you want exact values of breaks then you should compute them yourself, cause cut
round limits for interval:
x <- seq(0,1,by=.023)
levels(cut(x, 4))
# [1] "(-0.000989,0.247]" "(0.247,0.494]" "(0.494,0.742]" "(0.742,0.99]"
levels(cut(x, 4, dig.lab=10))
# [1] "(-0.000989,0.2467555]" "(0.2467555,0.4945]" "(0.4945,0.7422445]"
# [4] "(0.7422445,0.989989]"
You could look on code to cut.default
how breaks
are compute:
if (length(breaks) == 1L) {
if (is.na(breaks) | breaks < 2L)
stop("invalid number of intervals")
nb <- as.integer(breaks + 1)
dx <- diff(rx <- range(x, na.rm = TRUE))
if (dx == 0)
dx <- abs(rx[1L])
breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000,
length.out = nb)
}
So easy way is to grab this code and put into a function:
compute_breaks <- function(x, breaks)
if (length(breaks) == 1L) {
if (is.na(breaks) | breaks < 2L)
stop("invalid number of intervals")
nb <- as.integer(breaks + 1)
dx <- diff(rx <- range(x, na.rm = TRUE))
if (dx == 0)
dx <- abs(rx[1L])
breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000,
length.out = nb)
breaks
}
Result is
compute_breaks(x,4)
# [1] -0.000989 0.246755 0.494500 0.742244 0.989989
精彩评论