practically getting started with Sweave_问答_开发者

my question(s) might be less general than the title suggests. I am running R on Mac OS X with a MySQL database to store the data. I have been working with the Komodo / Sciviews-R for some time. Recently I had the need for auto-generated reports and looked into Sweave. I guess StatET / Eclipse appears to be the "standard" solution for Sweavers.

1) Is it reasonable to switch from Komodo to StatET Eclipse? I tried StatET before but chose Komodo over StatET because I liked the calltip / autosuggest and the more convenient config from Komodo so much.

2) What´s a reasonable workflow to generate Sweave files? Usually I develop my R code first and then care about the report later. I just learned today that there is one file in Sweave that contains R code and Latex code at once and that from this file the .tex document is created. While the example files look handily and can't really imagine how to enter my 250 + lines of R code to a file and mixed it up with Latex.

Is it possible to just enter the qplot() and ggplot() statements to a such a document and source the functionality like database connection and intermediate results somehow?

Or is it just a matter of being used to the mix of Latex and R code?

Thx for any suggestions, hint开发者_运维知识库s, links and back-to-the-roots-shout-outs…

You've asked several questions, so here's several answers;

Is StatEt/Eclipse the right way to do Sweave ?

Not nessarily (note: I'm an avid StatEt/Eclipse user, and use it for both pure R and Sweave/R and love it, I haven't used Komodo / sciviews-R). You should be able to run the sweave command from any R command line which will generate a .tex file. You can then turn the .tex file into something readable (like pdf) from any tex environment.

What's a good Sweave workflow ?

When I have wanted to turn an r script into a sweave report I generaly start with an empty sweave template and copy/paste my entire R script into a sweave R block just after the title, i.e;

<<label=myEntireRScript, echo=false, include=false>> 
#Insert code here
myTable<-dataframe(...)
myPlot<-qplot(....)
@

Then I go through and find the parts I want to report. For instance, if i want to put a table into the report, I'll cut the R block and put an xtable block in, and the same for variables and plots.

<<label=myEntireRScript, echo=false, include=false>>=  
#Insert code here
@ 
Put any text I want before my table here, maybe with a \Sexpr{print(variable)} named variable

<<label=myTable, result=Tex>>= 
myTable<-dataframe(...)
print(xtable(mytable,...),...)
@ 
Any text I want before my figure
<label=myplot, result=figure>>= 
myPlot<-qplot(....)
print(qplot)
@

You may want to look at these related SO posts. The rest of my post relates to your question 2.

When creating reports with Sweave, I usually keep most of the R code and the report text separate. If the R code is fast to run, then I prefer I will include something like the following at the start of the .Rnw file:

<<>>
source('/path/to/script.r')
@

On the other hand, if the R code takes a long time, I will often include something like the following at the end of the R script:

Sweave('/path/to/report.Rnw'); system('pdflatex report.tex')

That way, I can re-generate the report quickly, without needing to run all the R code again. Then, the only work R has to do in the Sweave file is print tables, make graphs and maybe extract a few figures.

Like nullglob, I prefer to keep the R and Sweave files separate, but I prefer to save the workspace with save.image() rather than to source() the file. This avoids running the R calculations with each .Rnw file compiling (and I always end up tinkering with the typesetting more than I'd like).

My general work flow is to do each paper/project in it's own folder with it's own R file(s). When the calculation side is "done", I save.image() to store all the workspace variables as-is.

Then, in the .Rnw file in the same directory I set the working directory with setwd() and load all variables with load(".Rdata"). Of course, you can change the name you use for your workspace, but I do one workspace per folder and keep the default name. Oh, and if you tinker with the R file, be sure save the workspace image and watch out for variables that linger in the workspace and .Rnw file, but are no longer part of the R file... this is where the save.image() approach can cause some headaches.

I am on a Mac and I suggest TextMate if you're mildly geeky and emacs/ess if you're really geeky. I use vim and command line R, but emacs/ess works best for most. If you're in this for the long haul, I doubt you'll regret learning emacs/ess for R, Sweave, and LaTeX.