开发者

Using org-mode to structure an analysis

开发者 https://www.devze.com 2023-01-27 05:44 出处:网络
I am trying to make better use of org-mode for my projects. I think literate programming is especially applicable to the realm of data analysis and org-mode lets us do some pretty awesome literate pro

I am trying to make better use of org-mode for my projects. I think literate programming is especially applicable to the realm of data analysis and org-mode lets us do some pretty awesome literate programming.

I think most of you will agree with me that the workflow for writing an analysis is different than most other types of programming. I don't just write a program, I explore the data. And, while many of these explorations are dead-ends, I don't want to delete/ignore them completely. I just don't want to re-run them every time I execute the org file. I also tend to find or develop chunks of useful code that I would like to put into an analytic template, but some of these chunks won't be relevant for every project and I'd like to know how to make org-mode ignore these chunks when I am executing the entire buffer. Here's a simplified example.

* Import
  - I want org-mode to ignore import-sql.
#+srcname: import-data
#+begin_src R :exports none :noweb yes
<<import-csv>>
#+end_src

#+srcname: import-csv
#+begin_src R :exports none
data <- read.csv("foo-clean.csv")
#+end_src

#+srcname: import-sql
#+begin_src R :exports none
library(RSQLite)
blah blah blah
#+end_src

* Clean
  - This is run on foo.csv, producing foo-clean.csv
  - Fixes the mess of -9 and -13 to NA for my sanity.
  - This only needs to be run once, and after that, reference.
  - How can I tell org-mode to skip this?
#+srcname: clean-csv
#+begin_src sh :exports none
sed .....
#+end_src

* Explore

** Explore by a factor (1)
   - Dead end. Did not pan out. Ignore.
   - Produces a couple of charts showing there is not interaction.
#+srcname: explore-by-a-factor-1
#+begin_src R :exports none :noweb yes
#+end_src

** Explore by a factor (2)
   - A useful exploration that I will reference later in a report.
   - Produces a couple of charts showing the interaction of my variables.
#+srcname: explore-by-a-factor-2
#+begin_src R :exports none :noweb yes
#+end_src

I would like to be able to use org-babel-execute-buffer and have org-mode somehow know to skip over the code blocks import-sql, clean-csv and explore-by-a-factor-1. I want them in the org file, because they are relevant to the project. After-all, tomorrow someone might want to know why I was so sure explore-by-a-factor-1 was not useful. I want to keep that code around, so I can bang out the plot or the analysis or what-ever and go on, but not have it开发者_Python百科 run every-time I rerun everything because there's no reason to run it. Ditto with the clean-csv stuff. I want it around, to document what I did to the data (and why), but I don't want to re-run it every time. I'll just import foo-clean.csv.

I Googled all over this and read a bunch of org-mode mailing list archives and I was able to find a couple of ideas, but not what I want. EXPORT_SELECT_TAGS, EXPORT_EXCLUDE_TAGS are great, when exporting the file. And the :tangle header works well, when creating the actual source files. I don't want to do either of these. I just want to execute the buffer. I would like to be able to define code blocks in a similar fashion to be executed or ignored. I guess I would like to find a way to have an org variable such as:

EXECUTE_SELECT_TAGS

This way I could simply tag my various code blocks and be done with it. It would be even nicer if I could then run the file, using only source blocks with specific tags. I can't find a way to do this and I thought I would ask before asking/begging for a new feature in org-mode.


I figured out. From the Org manual (since updated):

The :eval header argument can be used to limit the evaluation of specific code blocks. :eval accepts two arguments “never” and “query”. :eval never will ensure that a code block is never evaluated, this can be useful for protecting against the evaluation of dangerous code blocks. :eval query will require a query for every execution of a code block regardless of the value of the org-confirm-babel-evaluate variable.

So you just have to add :eval never to the header of the blocks that you don’t want to execute, and voilá!


While I never did get an answer to my question, the discussion was interesting and apparently an org-mode based Template for R strikes a few people as an interesting idea. I downloaded the source code to org-mode and looked at org-babel-execute-buffer. It is, as I feared, a naive function which does precisely what it says it does and nothing more. It is not (currently) possible to pass it any additional parameters to affect it's behavior. (Unless I am badly misreading the lisp, which is entirely possible.)

Eventually, I decided org-babel-execute-buffer is not necessary for a useful R template system. Babel's noweb functionality is really flexible and I think it is possible to build a workable solution using noweb, rather than trying to develop a complex tagging schema to define how/when to run things.

For tangling/export it should still be possible to use tags to create usable/sane output.

For anyone who is interested: LiterateR

It's probably a little rude to use this thread to put this out there but this is why I asked the question in the first place. TemplateR is my attempt to make R a little easier to use. Right now it is just a template with two simplistic functions. I consider it to be a proof of concept at this point. Eventually, I want to develop something that does more to help people develop R projects more quickly. TemplateR will accomplish this by: 1. Provide a strong structure to develop around. 2. Provide built-in function to provide support for common tasks, especially in the realm of reproducible research. 3. Provide snippets of tested code that can be rapidly re-purposed for the current project.

Right now, all it provides is a basic structure/framework and two simple functions. 1. Identify which R packages are missing (based on what is manually entered into a table) and 2. Creates project directories (plots, data, reports).

More will come in future versions. The README.org and TODO.org go into further detail.

0

精彩评论

暂无评论...
验证码 换一张
取 消