开发者

Kernel density plot bandwidth in ggplot2 with `facet_wrap`

开发者 https://www.devze.com 2023-03-01 05:11 出处:网络
I would like to use stat_density() and facet_wrap() in the ggplot2 package to create kernel density plots for different groupings, but I want to make sure that I use the same bandwidth for every plot.

I would like to use stat_density() and facet_wrap() in the ggplot2 package to create kernel density plots for different groupings, but I want to make sure that I use the same bandwidth for every plot. Can I be sure that stat_density() uses the same bandwidth for every plot?

For example, using diamonds:

library(ggplot2)    
ggplot(diamonds, aes(x = carat)) + 
  stat_density() + 
  facet_wrap(~ cut) + 
  scale_x_log()

In the documentation it shows that I can use adjust to adjust the automatic bandwidth, but this just applies a multiple and returns me to the original question. stat_density() also has a ... option, but I haven't been able to pass though the density() option bw, like this:

ggplot(diamonds, aes(x = carat)) + 
  stat_density(bw = 1) + 
  facet_wrap(~ cut) + 
  scale_x_log()

So, if stat_density() isn't using the same bandwidth across all facets, is there a way that I can force this? I tried a ddply() solution with transform() and density(), but this fails because density() doesn't necessarily return the same number of x and y values as the input. Any ideas? Thanks!

Edit It looks like ggplot2 assigns an optimal bandwidth to each facet (it looks like @Ramnath and Dianardo, Fortin, and Lemieux Econometrica 1996 agree with this), not the constant bandwidth I was seeking. But, if I did want a constant bandwidth across all facets, my attempt below fails.

my.density <- function(x) {
    temp <- density(x$car开发者_运维知识库at, bw = 0.5)
    return(data.frame(carat = temp$x, density = temp$y))
}
temp <- ddply(diamonds, .(cut), my.density)
ggplot(temp, aes(x = carat, y = density)) + 
             geom_point() + 
             facet_wrap(~ cut) + 
             scale_x_log()
Warning messages:
1: In match.fun(get(".transform", .))(values) : NaNs produced
2: In match.fun(get(".transform", .))(values) : NaNs produced
3: In match.fun(get(".transform", .))(values) : NaNs produced
4: In match.fun(get(".transform", .))(values) : NaNs produced
5: In match.fun(get(".transform", .))(values) : NaNs produced
6: Removed 84 rows containing missing values (geom_point). 
7: Removed 113 rows containing missing values (geom_point). 
8: Removed 98 rows containing missing values (geom_point). 
9: Removed 98 rows containing missing values (geom_point). 
10: Removed 106 rows containing missing values (geom_point). 


The warnings are on account of the negative values for carat in my.density. A slight modification of your code would do the trick:

  ggplot(temp, aes(x = carat, y = density)) + 
    geom_line(subset = .(carat > 0)) +
   facet_wrap(~ cut) + scale_x_log() 

Hope this is useful

0

精彩评论

暂无评论...
验证码 换一张
取 消