Suppose I have 10 heavy different functions (parallel or not) deciding about the same problem. Is there a good way to implement a voting scheme that lazy-automatically realizes when a majority is reached and no more calculation is needed?
obs.: This is more a question about scope/limits of lazy ev. Of course a simple "if" could detect a majority.
Thanks
[EDIT 1]
... simple "if" could detect a majority.
Sorry, I meant by "single if" -> "single wait for all process to finish".
... par开发者_JS百科allel or not ...
I just don't know parallelism matters in this case. (problems with my ambiguous English)
Short answer. Yes, it is possible to implement such a system, but no, built-in laziness won't help you.
Long answer. I believe you need a bit different laziness. Haskell's lazy evaluation is a kind of normal evaluation order, which works as follows:
- When the function is called, evaluator tries to compute it first, without computing its arguments.
- If the control flow comes to the point, where some argument needs to be computed, it evaluates it. Then the evaluation of the function continues.
So, arguments are evaluated as needed, "on demand". Moreover, they are evaluated one by one. This is a good idea for the language itself, and even imperative languages with applicative order of evaluation cannot work without such lazy functions - operators like or
and and
are lazy by the nature in most programming languages. But in your situation, is it something you actually need? Nope. You need to evaluate all arguments in parallel and finish evaluation of function itself when some of args are computed.
How to implement. You need to completely re-implement evaluation system, and I believe pure functional programming without side-effects and lazy evaluation will only hinder you. Here's one way to do it. Create function, say, paplly :: [ArgumentType] -> TriggerFunction -> ResultType
, where papply
stands for "parallel apply", ArgumentType
is a type of actual arguments to compute (in your case it may be closure of function + problem to solve), TriggerFunction
is a function, which is called when one of args is computed, and ResultType
is Boolean in your case. This function must work as follows:
- Run evaluation of all arguments in parallel.
- When one of arguments is computed, it must call the TriggerFunction with the result of evaluation.
- The trigger function must have a "memory" to remember all previous results. If, when called, it founds out, that there are enough arguments to finish evaluation of main function, it does it, interrupting computing the rest of args.
This is only one of the ways to do it, not the most functional (it uses mutable "memory"). You can also run the trigger function in parallel with other arguments and use some kind of synchronization to pass control between all of them. Or you can use some kind of messages like in Erlang or Scala. Unfortunately I haven't enough experience with Haskell to write actual code, but @Dietrich Epp's post seems to represent similar idea, so you can use it as a base.
You want a function like this:
majority :: [Bool] -> Bool
And you want it to work in parallel. No sweat! Unfortunately, I don't know of a way to do this without bypassing the type system. Here is an example implementation:
import Control.Concurrent
import Control.Concurrent.MVar
import System.IO.Unsafe
majority :: [Bool] -> Bool
majority votes = unsafePerformIO $
do v <- newEmptyMVar
nfalse <- newMVar 0
ntrue <- newMVar 0
let n = length votes
m = (n `div` 2) + 1
count x =
let (var, min) = if x then (ntrue, m) else (nfalse, n-m+1)
in do i <- modifyMVar var $ \i -> return (i+1, i+1)
if i == min then putMVar v x else return ()
threads <- mapM (forkIO . count) votes
r <- takeMVar v
mapM_ killThread threads
return r
Note: I'm not positive this is correct.
You can do this without parallel evaluation trivially, using lazy naturals. In this case I chose to use the peano-inf package on hackage: http://hackage.haskell.org/package/peano-inf
import Number.Peano.Inf
import Debug.Trace
import Data.List
myList = [trace "1" True, trace "2" True, trace "3" False, trace "4" True, trace "5" True]
btoNat True = 1 :: Nat
btoNat False = 0 :: Nat
ans = sum $ map btoNat myList
{-
*Main> ans > 2
1
2
3
4
True
-}
Note that 5 is not printed in the trace, because evaluation is cut short before then.
To do this with parallelism requires manually spawning and killing threads, etc, which is fine but certainly less pleasant.
Note that the above code uses standard sum. This sort of uncommon use-case is why, though many feel it isn't worth it, sum is not made as strict as possible.
I have tried combining the sclv's solution with luqui's comment about unamb
and would like to share my results. I'll start with the test cases:
list1 = [True, True, undefined, True, undefined]
list2 = [undefined, False, False]
list3 = concat $ replicate 500 list1
list4 = concat $ replicate 500 list2
main = mapM (print . vote) [list1, list2, list3, list4]
vote :: [Bool] -> Bool
This should print
True
False
True
False
I'll start with the list1
example first. The voting function to pass it can look like this:
voteByTrue list = sum (map bToNat list) >= threshold
where
threshold = (genericLength list + 1) `quot` 2
This is the same as in sclv's answer. Now we need to make sum
lazier so that the computation does not abort upon encountering an undefined
summand. My first take on this was:
Zero |+ y = y
Succ x |+ y = Succ (x + y)
instance Num Nat where
x + y = (x |+ y) `lub` (y |+ x)
Here, |+
is the addition operator strict in its first argument and +
is non-strict in both of its arguments. It worked for toy examples, like list1
, but the performance of this solution deteriorates very quickly because of the exponential blow-up of the number of threads (see how each +
spawns 2 threads, each of which calls +
again, usually with the same arguments). With such performance, vote list3
does not terminate quickly enough. To fight this, I have tried violating the unamb
's contract and implemented the following function:
-- | The same as unamb, but does not have the
-- 'agree unless bottom' precondition.
brokenUnamb = unamb
infoMinMax a b = (x, y)
where
~(x, y) = (a `seq` (b, a)) `brokenUnamb` (b `seq` (a, b))
This function sorts its two arguments by the amount of information they hold. It always returns less evaluated value as the x
and more evaluated value as the y
. This breaks purity by violating the condition of unamb
arguments to be equal. However, it allows us to implement +
more efficiently:
instance Num Nat where
x + y = x' |+ y' where (y', x') = infoMinMax x y
This allows us to pass the large test (list3
)! Now, to the false-tests... It turned out that infoMinMax
function is useful here as well!
vote list = voteByTrue list `maxInfo` voteByFalse list
where
voteByFalse = not . voteByTrue . map not
maxInfo x y = snd (infoMinMax x y)
Now, this allows the program to pass all four tests, albeit the large ones take several seconds to complete. The CPU usage skyrockets to 200% if I replace undefined
with odd (sum [1..])
too, so some parallelism is indeed happening.
However, the problem of broken purity remains. Can someone suggest a solution where simple unamb
is enough?
if we consider that those functions yield booleans, the question becomes if it is possible to write a function that takes 10 booleans and returns true if 6 of them are true that always requires the values of fewer than 10 of its inputs.
the simple way but one that doesn't fit the stated requirements is to test each input in turn counting up the number of trues and falses if trues >= 6 stop return true else if falses >= 6 stop return false and if we reach the last input without triggering either of those conditions, return false. as this will test all the inputs in some cases, so i think the answer to this question is no, lazy evaluation doesn't help in this example.
精彩评论