Many R objects have S3 methods to plot associated with them. For instance, every R regression tutorial contains something like this:
dat <- data.frame(x=runif(10))
dat$y <- dat$x+runif(10)
my.lm <- lm( y~x, dat )
plot(my.lm)
Which displays regression diagnostics.
Similarly, I have an S3 objec开发者_StackOverflowt for a package which consists of a list which basically holds a few time series. I have a plot.myobject
method for it which reaches into the list, yanks out the time series, and plots them on the same graph. I would like to rewrite this as a ggplot2 function so that it will be prettier and perhaps more extensible as well.
Because this package is intended to get people without much R experience up and running quickly, I'd like this to be a one-liner with one argument, as in plot(myobject)
, ggplot(myobject)
, or whatever the appropriate version might be. Then once they get hooked, they can learn more about ggplot2
and customize the graph to their heart's content.
My initial temptation was to simply replace the internals of the plot.myobject
method to use ggplot2. This, however, seems like it might lose me major style points.
Is this a bad idea, and if so why and what alternative should I use?
There is an existing idiom in ggplot2
to do exactly what you propose. It is called fortify
. It takes an object and produces a version of the object in a form that ggplot can work with, i.e. a data.frame. Section 9.3 in Hadley's ggplot2 book describes how to do this, using the S3 object class lm
as an example. To see this in action, type fortify.lm
into your console to get the following code:
function (model, data = model$model, ...)
{
infl <- influence(model, do.coef = FALSE)
data$.hat <- infl$hat
data$.sigma <- infl$sigma
data$.cooksd <- cooks.distance(model, infl)
data$.fitted <- predict(model)
data$.resid <- resid(model)
data$.stdresid <- rstandard(model, infl)
data
}
<environment: namespace:ggplot2>
Here is my own example of writing a fortify
method for tree
, originally published on the ggplot2 mailing list
fortify.tree <- function(model, data, ...){
require(tree)
# Uses tree:::treeco to extract data frame of plot locations
xy <- tree:::treeco(model)
n <- model$frame$n
# Lines copied from tree:::treepl
x <- xy$x
y <- xy$y
node = as.numeric(row.names(model$frame))
parent <- match((node%/%2), node)
sibling <- match(ifelse(node%%2, node - 1L, node + 1L), node)
linev <- data.frame(x=x, y=y, xend=x, yend=y[parent], n=n)
lineh <- data.frame(x=x[parent], y=y[parent], xend=x,
yend=y[parent], n=n)
rbind(linev[-1,], lineh[-1,])
}
theme_null <- opts(
panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank(),
axis.text.x = theme_blank(),
axis.text.y = theme_blank(),
axis.ticks = theme_blank(),
axis.title.x = theme_blank(),
axis.title.y = theme_blank(),
legend.position = "none"
)
And the plot code. Notice that the data passed to ggplot
is not a data.frame
but a tree
object.
library(ggplot2)
library(tree)
data(cpus, package="MASS")
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)
p <- ggplot(data=cpus.ltr) +
geom_segment(aes(x=x,y=y,xend=xend,yend=yend,size=n),
colour="blue", alpha=0.5) +
scale_size("n", to=c(0, 3)) +
theme_null
print(p)
As per Hadley's suggestion in comments, I have submitted a generic S3 autoplot()
to the ggplot2
Github repository. So if it's accepted and checks out, there should be an autoplot
available for this use in the future.
Update
autoplot
is now available in ggplot2
.
Using plot.myobject
is easy to remember and execute. However, if you're talking about myobject
s that already have plot.myobject
functions, you have to possibly worry about the different versions in the different namespaces. But if it's just for your own myobject
s, you don't lose any style points with me. The nlme
package, for one, does this extensively, though with lattice graphs instead of ggplot.
Using ggplot.myobject
is an alternative; you shouldn't have to worry about other versions, unless other people start doing the same thing. However, as you note, it does break the ggplot
usage paradigm.
Another alternative is to use a new name, say, gsk3plot
; you never have to worry about other versions, it's not too hard to remember, and you can make alternatives to plot to your heart's content without having to worry about conflicts. This is probably what I'd choose as it makes it clear to the audience that these plots are customizable and this is a function that makes the plot the way that you prefer, and that if they are so inclined, they could dig in and do the same thing.
ggplot
and ggplot2
methods generally expect the data to come to them in melt()-ed form. So your methods may need to do a melt (from package plyr) and then "map" the resulting column names to arguments in the ggplot methods.
精彩评论