开发者

Alternatives to system() in R for calling sed, rsync, ssh etc.: Do functions exist, should I write my own, or am I missing the point?

开发者 https://www.devze.com 2023-03-30 20:20 出处:网络
Recently, I found the base::files commands. Along with other commands like getwd, write.lines, file.show, dir, etc. there seem to be a number of R equivalents of bash functions.

Recently, I found the base::files commands. Along with other commands like getwd, write.lines, file.show, dir, etc. there seem to be a number of R equivalents of bash functions.

I have also written some functions in R that streamline calls to ssh and rsync through system.

for example:

rsync <- functio开发者_JS百科n(from, to){
  system(paste('rsync -outi', from, to, sep = ' '), intern=TRUE)
}

But before I go to much further with this, I have a few questions:

  • does R already have built in commands for common shell programs, if so, where can I find them?
  • if not, are there reasons to avoid writing my own functions?
  • is there a better alternative to the approach outlined in the rsync example above?
  • would a collection of such functions warrant a package?


does R already have built in commands for common shell programs, if so, where can I find them?

There are some function like grep that mimic shell progams. Search for them as you would any other function – the names are often the same.

if not, are there reasons to avoid writing my own functions?

No obvious problems.

is there a better alternative to the approach outlined in the rsync example above?

Looks good, but you need to be very careful about checking user input if things are passed to the shell.

would a collection of such functions warrant a package?

Absolutely. Go for it.


I started to go down that route with wrapping git functions for devtools, but eventually realised what I needed was:

bash <- function() system("bash")

with a bit of wrapping to make sure I ended up in the right directory.


There's not much out there, apparently ...

> library(sos)
> findFn("rsync")
found 0 matches
x has zero rows;  nothing to display.
Warning message:
In findFn("rsync") : HIT not found in HTML;  processing one page only.
> findFn("ssh")
found 27 matches;  retrieving 2 pages
2 

The ssh hits are either false positives or part of parallel-processing packages (GridR, nws, biopara). RCurl has an scp command (based on libcurl, not a system call).


UPDATED

thanks to @hadley for pointing this out - the time penalty was due to using the intern = TRUE argument, see update below.

Rather than deleting the answer, I am going to leave the answer up here for reference, unless it gets lots of downvotes


After creating a few such commands, I have realized one disadvantage (potentially significant):

Wrapping a system call in a function increases the speed at which the function is called, almost 8-fold in this example:

Using system:

system.time(system(paste('rsync -outi', '~/dir/files* ', 'serverhost:')))
   user  system elapsed 
  0.060   0.020   0.552 

Wrapping system in a new function, rsync:

rsync <-  function (from, to, pattern = "") {
    system(paste("rsync -outi", from, to, sep = " "), intern = TRUE)
  }
system.time(rsync(from = '~/dir/files*', to = 'serverhost:'))
   user  system elapsed 
  0.040   0.030   3.825 

Update

The speed penalty resulted from the unnecessary use of intern = TRUE

rsync <-  function (from, to, pattern = "") {
    system(paste("rsync -outi", from, to, sep = " "))
  }
system.time(rsync(from = '~/dir/files*', to = 'serverhost:'))
   user  system elapsed 
  0.070   0.020   0.504 
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号