Bad idea? ggplotting an S3 class object_问答_开发者

Many R objects have S3 methods to plot associated with them. For instance, every R regression tutorial contains something like this:

dat <- data.frame(x=runif(10))
dat$y <- dat$x+runif(10)
my.lm <- lm( y~x, dat )
plot(my.lm)

Which displays regression diagnostics.

Similarly, I have an S3 objec开发者_StackOverflowt for a package which consists of a list which basically holds a few time series. I have a plot.myobject method for it which reaches into the list, yanks out the time series, and plots them on the same graph. I would like to rewrite this as a ggplot2 function so that it will be prettier and perhaps more extensible as well.

Because this package is intended to get people without much R experience up and running quickly, I'd like this to be a one-liner with one argument, as in plot(myobject), ggplot(myobject), or whatever the appropriate version might be. Then once they get hooked, they can learn more about ggplot2 and customize the graph to their heart's content.

My initial temptation was to simply replace the internals of the plot.myobject method to use ggplot2. This, however, seems like it might lose me major style points.

Is this a bad idea, and if so why and what alternative should I use?

There is an existing idiom in ggplot2 to do exactly what you propose. It is called fortify. It takes an object and produces a version of the object in a form that ggplot can work with, i.e. a data.frame. Section 9.3 in Hadley's ggplot2 book describes how to do this, using the S3 object class lm as an example. To see this in action, type fortify.lm into your console to get the following code:

function (model, data = model$model, ...) 
{
    infl <- influence(model, do.coef = FALSE)
    data$.hat <- infl$hat
    data$.sigma <- infl$sigma
    data$.cooksd <- cooks.distance(model, infl)
    data$.fitted <- predict(model)
    data$.resid <- resid(model)
    data$.stdresid <- rstandard(model, infl)
    data
}
<environment: namespace:ggplot2>

Here is my own example of writing a fortify method for tree, originally published on the ggplot2 mailing list

fortify.tree <- function(model, data, ...){
  require(tree)
  # Uses tree:::treeco to extract data frame of plot locations
  xy <- tree:::treeco(model)
  n <- model$frame$n

  # Lines copied from tree:::treepl
  x <- xy$x
  y <- xy$y
  node = as.numeric(row.names(model$frame))
  parent <- match((node%/%2), node)
  sibling <- match(ifelse(node%%2, node - 1L, node + 1L), node)

  linev <- data.frame(x=x, y=y, xend=x, yend=y[parent], n=n)
  lineh <- data.frame(x=x[parent], y=y[parent], xend=x,
      yend=y[parent], n=n)

  rbind(linev[-1,], lineh[-1,])

} 

theme_null <- opts(
    panel.grid.major = theme_blank(),
    panel.grid.minor = theme_blank(),
    axis.text.x = theme_blank(),
    axis.text.y = theme_blank(),
    axis.ticks = theme_blank(),
    axis.title.x = theme_blank(),
    axis.title.y = theme_blank(),
    legend.position = "none"
)

And the plot code. Notice that the data passed to ggplot is not a data.frame but a tree object.

library(ggplot2)
library(tree)

data(cpus, package="MASS")
cpus.ltr <- tree(log10(perf) ~ syct+mmin+mmax+cach+chmin+chmax, cpus)

p <- ggplot(data=cpus.ltr) + 
    geom_segment(aes(x=x,y=y,xend=xend,yend=yend,size=n),
      colour="blue", alpha=0.5) + 
    scale_size("n", to=c(0, 3)) + 
    theme_null
print(p)

Bad idea? ggplotting an S3 class object

As per Hadley's suggestion in comments, I have submitted a generic S3 autoplot() to the ggplot2 Github repository. So if it's accepted and checks out, there should be an autoplot available for this use in the future.

Update

autoplot is now available in ggplot2.

Using plot.myobject is easy to remember and execute. However, if you're talking about myobjects that already have plot.myobject functions, you have to possibly worry about the different versions in the different namespaces. But if it's just for your own myobjects, you don't lose any style points with me. The nlme package, for one, does this extensively, though with lattice graphs instead of ggplot.

Using ggplot.myobject is an alternative; you shouldn't have to worry about other versions, unless other people start doing the same thing. However, as you note, it does break the ggplot usage paradigm.

Another alternative is to use a new name, say, gsk3plot; you never have to worry about other versions, it's not too hard to remember, and you can make alternatives to plot to your heart's content without having to worry about conflicts. This is probably what I'd choose as it makes it clear to the audience that these plots are customizable and this is a function that makes the plot the way that you prefer, and that if they are so inclined, they could dig in and do the same thing.

ggplot and ggplot2 methods generally expect the data to come to them in melt()-ed form. So your methods may need to do a melt (from package plyr) and then "map" the resulting column names to arguments in the ggplot methods.