In quantstrat package I have located one of the main culprits for slowness of the applyRule function and wonder if there is more efficient to write the while loop. Any feedback would be helpful. For anyone experience wrapping this part into Parallel R.
As an option apply would work instead while? Or should I re-write this part into new function such as ruleProc and nextIndex? I am also dveling on Rcpp but that may be a streach. Any help and constructive advice is much appreciated?
while (curIndex) {
timestamp = Dates[curIndex]
if (isTRUE(hold) & holdtill < timestamp) {
hold = FALSE
holdtill = NULL
}
types <- sort(factor(names(strategy$rules), levels = 开发者_如何学Goc("pre",
"risk", "order", "rebalance", "exit", "enter", "entry",
"post")))
for (type in types) {
switch(type, pre = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules$pre, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, risk = {
if (length(strategy$rules$risk) >= 1) {
ruleProc(strategy$rules$risk, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, order = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr,)
} else {
if (isTRUE(path.dep)) {
timespan <- paste("::", timestamp, sep = "")
} else timespan = NULL
ruleOrderProc(portfolio = portfolio, symbol = symbol,
mktdata = mktdata, timespan = timespan)
}
}, rebalance = , exit = , enter = , entry = {
if (isTRUE(hold)) next()
if (type == "exit") {
if (getPosQty(Portfolio = portfolio, Symbol = symbol,
Date = timestamp) == 0) next()
}
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
symbol = symbol, status = "open", timespan = timestamp,
which.i = TRUE))) {
}
}, post = {
if (length(strategy$rules$post) >= 1) {
ruleProc(strategy$rules$post, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
})
}
if (isTRUE(path.dep))
curIndex <- nextIndex(curIndex)
else curIndex = FALSE
}
Garrett's answer does point to the last major discussion on the R-SIG-Finance list where a related question was discussed.
The applyRules function in quantstrat is absolutely where most of the time is spent.
The while loop code copied in this question is the path-dependent part of the applyRules execution. I believe all of this is covered in the documentation, but I'll briefly review for SO posterity.
We construct a dimension reduction index inside applyRules so that we don't have to observe every timestamp and check it. We only take note of specific points in time where the strategy may reasonably be expected to act on the order book, or where orders may reasonably be expected to get filled.
This is state-dependent and path-dependent code. Idle talk of 'vectorization' doesn't make any sense in this context. If I need to know the current state of the market, the order book, and my position, and if my orders may be modified in a time-dependent manner by other rules, I don't see how this code can be vectorized.
From a computer science perspective, this is a state machine. State machines in almost every language I can think of are usually written as while loops. This isn't really negotiable or changeable.
The question asks if use of apply would help. apply statements in R are implemented as loops, so no, it wouldn't help. Even a parallel apply such as mclapply or foreach can't help because this is inside a state dependent part of the code. Evaluating different time points without regard to state doesn't make any sense. You'll note that the non-state-dependent parts of quantstrat are vectorized wherever possible, and account for very little of the running time.
The comment made by John suggests removing the for loop in ruleProc. All that the for loop does is check each rule associated with the strategy at this point in time. The only compute-intensive part of that loop is the do.call to call the rule function. The rest of the for loop is simply locating and matching arguments for these functions, and from code profiling, doesn't take much time at all. It would not make much sense to use a parallel apply here either, since the rule functions are applied in type order, so that cancels or risk directives can be applied before new entry directives. Much as mathematics has an order of operations, or a bank has a deposit/withdrawal processing order, quantstrat has a rule type evaluation order, as laid out in the documentation.
To speed up execution, there are four main things that can be done:
- write a non-path dependent strategy: this is supported by the code, and simple strategies may be modeled this way. In this model you would write a custom rule function that calls addTxn directly when you think you should get your fills. It could be a vectorized function operating on your indicators/signals, and should be very fast.
- preprocess your signals:if there are fewer places where the state machine needs to evaluate the state of the order book/rules/portfolio to see if it needs to do something, the speed increase is nearly linear with the reduction in signals. This is the area most users neglect, writing signal functions that don't really do evaluation of when action may be required that would modify positions or the order book.
- explicitly parallelize parts of your analysis problem: I commonly write explicitly parallel wrappers to separate out different parameter evaluations or symbol evaluations, see applyParameter for an example using foreach
- rewrite the state machine inside applyRules in C/C++: Patches welcome, but do see the link Garrett posted for additional details.
I can assure you that most strategies can run in a fraction of a core-minute per symbol per day per core on tick data, if a little care is taken to the signal generation functions. Running large backtests on a laptop is not recommended.
Ref: quantstrat - applyRules
精彩评论