I have a dataframe with a column of integers that I would like to use as a reference to make a new categorical variable. I want to divide t开发者_开发知识库he variable into three groups and set the ranges myself (ie 0-5, 6-10, etc). I tried cut
but that divides the variable into groups based on a normal distribution and my data is right skewed. I have also tried to use if/then statements but this outputs a true/false value and I would like to keep my original variable. I am sure that there is a simple way to do this but I cannot seem to figure it out. Any advice on a simple way to do this quickly?
I had something in mind like this:
x x.range
3 0-5
4 0-5
6 6-10
12 11-15
x <- rnorm(100,10,10)
cut(x,c(-Inf,0,5,6,10,Inf))
Ian's answer (cut) is the most common way to do this, as far as i know.
I prefer to use shingle, from the Lattice Package
the argument that specifies the binning intervals seems a little more intuitive to me.
you use shingle like so:
# mock some data
data = sample(0:40, 200, replace=T)
a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)
my_bins = matrix(rbind(a, b, c, d, e), ncol=2)
# returns: (the binning intervals i've set)
[,1] [,2]
[1,] 0 5
[2,] 5 9
[3,] 9 19
[4,] 19 33
[5,] 33 41
shx = shingle(data, intervals=my_bins)
#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
min max count
1 0 5 23
2 5 9 17
3 9 19 56
4 19 33 76
5 33 41 46
We can use smart_cut
from package cutr
:
devtools::install_github("moodymudskipper/cutr")
library(cutr)
x <- c(3,4,6,12)
To cut with intervals of length 5 starting on 1 :
smart_cut(x,list(5,1),"width" , simplify=FALSE)
# [1] [1,6) [1,6) [6,11) [11,16]
# Levels: [1,6) < [6,11) < [11,16]
To get exactly your requested output :
smart_cut(x,c(0,6,11,16), labels = ~paste0(.y[1],'-',.y[2]-1), simplify=FALSE, open_end = TRUE)
# [1] 0-5 0-5 6-10 11-15
# Levels: 0-5 < 6-10 < 11-15
more on cutr and smart_cut
精彩评论