I am trying to package some of my Python code that calls R code using rpy2. That R code currently sits in a separate file which I source
from the Python script. For example, if the python script is myscript.py
, then the R code is stored in myscript_support.R
, and I have something like the following in myscript.py
:
from rpy2.robjects import *
# Load the R code
r.source(os.path.join(os.path.dirname(__file__), "myscript_support.R"))
# Call the R function
r[["myscript_R_function"]]()
I now want to package this Python script using setuptools, and I have a few questions:
How should I package the R support code, and once I have done s开发者_Go百科o, how do I find the path to the R file so I can source it?
The R code depends on several R packages. How can I ensure that these are installed? Should I just raise an informative error if these R packages cannot be loaded?
This question might be dated, but I ran into the same issue today and wanted to provide more detail for the question 1 solution suggested by @ivan_pozdeev and a new solution for question 2.
1) Edit your setup.py file to:
from setuptools import setup, find_packages
setup(
...
# If any package contains *.r files, include them:
package_data={'': ['*.r', '*.R']},
include_package_data=True)
)
2) Conda is quickly becoming a good option for dealing with package dependencies across both python and R. You can create an environment (http://conda.pydata.org/docs/using/envs), download all the r and python packages that you might need, and then generate an environment.yml file so that anyone can replicate your environment. Check out this blog for more info: https://www.continuum.io/content/conda-data-science
Well, imagine yourself as the setuptools packager and think of what you would expect the programmer to do.
- Setuptools knows nothing about R, its files' structure or that your code uses them somehow.
- Your R interpreter knows nothing about importing files from Python .egg's
For the first problem, you have two choices:
- Tell setuptools to just include some additional files without bothering what they are
- Teach setuptools about R, how to determine what R files your program uses and how to track and include their dependencies
The first option is implementable by passing include_package_data = True
to setup()
and providing masks of files to include in package_data
(setuptools docs, "Including Data Files" section). Paths relative to packages' directories can be used. The files will be accessible at run time at the same relative paths through the "Resource Management API" ("Accessing Data Files at Runtime" section).
The second option would require you to add your code to setuptools before invoking setup()
. For example, you may add a file finder to add relevant .R files to the results of find_packages()
. Or just generate the list of files for the previous paragraph by arbitrary means.
For the second problem, the easiest way is to force setuptools to install the package as a directory rather than an .egg by specifying zip_safe = False
.
You might use eager_resources
option instead that extracts a group of resources on demand ("Automatic Resource Extraction" section).
As for installing third-party R packages, an automatable technique is described at R Installation and Administration - Installing packages
How should I package the R support code, and once I have done so, how do I find the path to the R file so I can source it?
For the source files to be installed, you need to specify them in some way in package_data
. You can find their path in the exact same way as you do now.
The R code depends on several R packages. How can I ensure that these are installed? Should I just raise an informative error if these R packages cannot be loaded?
Either make setup.py
check if they exist (kind of "configtools approach") or just raise some kind of exception once you cannot load them. Or maybe do both of them, and then if for some reason the files you depend on disappear, at least you will know it.
精彩评论