I would like to use stat_density()
and facet_wrap()
in the ggplot2
package to create kernel density plots for different groupings, but I want to make sure that I use the same bandwidth for every plot. Can I be sure that stat_density()
uses the same bandwidth for every plot?
For example, using diamonds
:
library(ggplot2)
ggplot(diamonds, aes(x = carat)) +
stat_density() +
facet_wrap(~ cut) +
scale_x_log()
In the documentation it shows that I can use adjust
to adjust the automatic bandwidth, but this just applies a multiple and returns me to the original question. stat_density()
also has a ...
option, but I haven't been able to pass though the density()
option bw
, like this:
ggplot(diamonds, aes(x = carat)) +
stat_density(bw = 1) +
facet_wrap(~ cut) +
scale_x_log()
So, if stat_density()
isn't using the same bandwidth across all facets, is there a way that I can force this? I tried a ddply()
solution with transform()
and density()
, but this fails because density()
doesn't necessarily return the same number of x and y values as the input. Any ideas? Thanks!
Edit
It looks like ggplot2
assigns an optimal bandwidth to each facet (it looks like @Ramnath and Dianardo, Fortin, and Lemieux Econometrica 1996 agree with this), not the constant bandwidth I was seeking. But, if I did want a constant bandwidth across all facets, my attempt below fails.
my.density <- function(x) {
temp <- density(x$car开发者_运维知识库at, bw = 0.5)
return(data.frame(carat = temp$x, density = temp$y))
}
temp <- ddply(diamonds, .(cut), my.density)
ggplot(temp, aes(x = carat, y = density)) +
geom_point() +
facet_wrap(~ cut) +
scale_x_log()
Warning messages:
1: In match.fun(get(".transform", .))(values) : NaNs produced
2: In match.fun(get(".transform", .))(values) : NaNs produced
3: In match.fun(get(".transform", .))(values) : NaNs produced
4: In match.fun(get(".transform", .))(values) : NaNs produced
5: In match.fun(get(".transform", .))(values) : NaNs produced
6: Removed 84 rows containing missing values (geom_point).
7: Removed 113 rows containing missing values (geom_point).
8: Removed 98 rows containing missing values (geom_point).
9: Removed 98 rows containing missing values (geom_point).
10: Removed 106 rows containing missing values (geom_point).
The warnings are on account of the negative values for carat
in my.density
. A slight modification of your code would do the trick:
ggplot(temp, aes(x = carat, y = density)) +
geom_line(subset = .(carat > 0)) +
facet_wrap(~ cut) + scale_x_log()
Hope this is useful
精彩评论