I'm using the mclapply
function in the multicore
package to do parallel processing. It seems that all child processes started produce the same names for temporary files given by the tempfile
function. i.e. if I have four processors,
library(multicore)
mclapply(1:4, function(x) tempfile())
will give four exactly same filenames. Obviously I need the temporary files to be different so that the child processes don't overwrite each others' files. When using tempfile
indirectly, i.e. calling some function that call开发者_如何学Pythons tempfile
I have no control over the filename.
Is there a way around this? Do other parallel processing packages for R (e.g. foreach
) have the same problem?
Update: This is no longer an issue since R 2.14.1.
CHANGES IN R VERSION 2.14.0 patched:
[...]
o tempfile() on a Unix-alike now takes the process ID into account.
This is needed with multicore (and as part of parallel) because
the parent and all the children share a session temporary
directory, and they can share the C random number stream used to
produce the uniaue part. Further, two children can call
tempfile() simultaneously.
I believe multicore
spins off a separate process for each subtask. If that assumption is correct, then you should be able to use Sys.getpid()
to "seed" tempfile:
tempfile(pattern=paste("foo", Sys.getpid(), sep=""))
Use the x
in your function:
mclapply(1:4, function(x) tempfile(pattern=paste("file",x,"-",sep=""))
Because the parallel jobs all run at the same time, and because the random seed comes from the system time, running four instances of tempfile in parallel will typically produce the same results (if you have 4 cores, that is. If you only have two cores, you'll get two pairs of identical temp file names).
Better to generate the tempfile names first and give them to your function as an argument:
filenames <- tempfile( rep("file",4) )
mclapply( filenames, function(x){})
If you're using someone else's function that has a tempfile call in it, then working the PID into the tempfile name by modifying the tempfile function, as previously suggested, is probably the simplest plan:
tempfile <- function( pattern = "file", tmpdir = tempdir(), fileext = ""){
.Internal(tempfile(paste("pid", Sys.getpid(), pattern, sep=""), tmpdir, fileext))}
mclapply( 1:4, function(x) tempfile() )
At least for now, I chose to monkey-patch my way around this by using the following code in my .Rprofile
following Daniel's advice to use PID values.
assignInNamespace("tempfile.orig", tempfile, ns="base")
.tempfile = function(pattern="file", tmpdir=tempdir())
tempfile.orig(paste(pattern, Sys.getpid(), sep=""), tmpdir)
assignInNamespace("tempfile", .tempfile, ns="base")
Obviously it's not a good option for any package you'd distribute, but for a single user's need it's the best option thus far since it works in all cases.
精彩评论