I have a data frame with 3 variables, which are all wind speeds. I want to check how well the hardware was calibrated by plotting all the variables against each other. Although there are three in this instance, it may be that there are up to 6.
This would result in 3 different graphs, where the x
and y
parameters keep changing. I'd really like to plot these using facets- or something with the same appearance.
Here is some sample data, in a data frame called wind
:
wind <- structure(list(speed_60e = c(3.029, 3.158, 2.881, 2.305, 2.45,
2.358, 2.325, 2.723, 2.567开发者_运维问答, 1.972, 2.044, 1.745, 2.1, 2.08, 1.914,
2.44, 2.356, 1.564, 1.942, 1.413, 1.756, 1.513, 1.263, 1.301,
1.403, 1.496, 1.828, 1.8, 1.841, 2.014), speed_60w = c(2.981,
3.089, 2.848, 2.265, 2.406, 2.304, 2.286, 2.686, 2.511, 1.946,
2.004, 1.724, 2.079, 2.058, 1.877, 2.434, 2.375, 1.562, 1.963,
1.436, 1.743, 1.541, 1.256, 1.312, 1.402, 1.522, 1.867, 1.837,
1.873, 2.055), speed_40 = c(2.726, 2.724, 2.429, 2.028, 1.799,
1.863, 1.987, 2.445, 2.282, 1.938, 1.721, 1.466, 1.841, 1.919,
1.63, 2.373, 2.22, 1.576, 1.693, 1.185, 1.274, 1.421, 1.071,
1.163, 1.166, 1.504, 1.77, 1.778, 1.632, 1.545)), .Names = c("speed_60e",
"speed_60w", "speed_40"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30"))
R> head(wind)
speed_60e speed_60w speed_40
1 3.029 2.981 2.726
2 3.158 3.089 2.724
3 2.881 2.848 2.429
4 2.305 2.265 2.028
5 2.450 2.406 1.799
6 2.358 2.304 1.863
I wish to plot three square graphs. An individual one can be plotted by calling
ggplot() + geom_point(data=wind, aes(wind[,1],wind[,3]), alpha=I(1/30),
shape=I(20), size=I(1))
Any idea how I can do this?
Will something like this do?
plotmatrix(data = wind) + geom_smooth(method="lm")
Which gives:
Hadley calls this a "Crude experimental scatterplot matrix", but it might suffice for your needs?
Edit: Currently, plotmatrix()
isn't quite flexible enough to handle all of @Chris' requirements regarding specification of the geom_point()
layer. However, we can cut the guts out of plotmatrix()
as use Hadley's nice code to create the data structure needed for plotting, but plot it however we like using standard ggplot()
calls. This function also drops the densities but you can look into the code for plotmatrix()
to see how to get them.
First, a function that expands the data from the wide format to the repeated format required for a pairs plot where we plot each variables against every other, but not itself.
Expand <- function(data) {
grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
grid <- subset(grid, x != y)
all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
xcol <- grid[i, "x"]
ycol <- grid[i, "y"]
data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol],
x = data[, xcol], y = data[, ycol], data)
}))
all$xvar <- factor(all$xvar, levels = names(data))
all$yvar <- factor(all$yvar, levels = names(data))
all
}
Note: all this does is steal Hadley's code from plotmatrix()
- I have done nothing fancy here.
Expand the data:
wind2 <- Expand(wind)
Now we can plot this as any other long-format data object required by ggplot()
:
ggplot(wind2, aes(x = x, y = y)) +
geom_point(alpha = I(1/10), shape = I(20), size = I(1)) +
facet_grid(xvar ~ yvar, scales = "free")
If you want the densities, then we can pull out that bit of code two into a helper function:
makeDensities <- function(data) {
densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
data.frame(xvar = names(data)[i], yvar = names(data)[i],
x = data[, i])
}))
densities
}
Then compute the densities for the original data:
dens <- makeDensities(wind)
and then add then using the same bit of code from plotmatrix()
:
ggplot(wind2, aes(x = x, y = y)) +
geom_point(alpha = I(1/10), shape = I(20), size = I(1)) +
facet_grid(xvar ~ yvar, scales = "free")+
stat_density(aes(x = x, y = ..scaled.. * diff(range(x)) + min(x)),
data = dens, position = "identity", colour = "grey20",
geom = "line")
A complete version of the original figure I showed above but using the extracted code would be:
ggplot(wind2, aes(x = x, y = y)) +
geom_point(alpha = I(1/10), shape = I(20), size = I(1)) +
facet_grid(xvar ~ yvar, scales = "free")+
stat_density(aes(x = x, y = ..scaled.. * diff(range(x)) + min(x)),
data = dens, position = "identity", colour = "grey20",
geom = "line") +
geom_smooth(method="lm")
giving:
Melt the data first (convert it to long form).
mwind <- melt(wind)
ggplot(mwind, aes(value)) + geom_histogram() + facet_wrap(~ variable)
If you want to plot points, you need to add an index variable for the x axis.
ggpairs
from the GGally
package is quite nice for quick comparison of each variable in a dataframe:
ggpairs(wind)
It will also handle comparisons of numeric and factor data.
精彩评论