开发者

Seq.zip in F# may require seemingly extra sequence element to complete

开发者 https://www.devze.com 2023-03-31 22:40 出处:网络
Let\'s Seq.zip two F# sequences, one represented by a list and the other - by Seq.filter applied to an infinite sequence:

Let's Seq.zip two F# sequences, one represented by a list and the other - by Seq.filter applied to an infinite sequence:

Seq.initInfinite (fun i -> i)
|> Seq.filter ((>) 3)
|> Seq.zip ["A";"B"]

returns as expected

val it : seq<string * int> = seq [("A", 0); ("B", 1)]

However,

Seq.initInfinite (fun i -> i)
|> Seq.filter ((>) 2)
|> Seq.zip ["A";"B"]

hangs on trying to get non-existing 3rd member that can pass Seq.filter and eventually blows up fsi:

 Error: Enumeration based on System.Int32 exceeded System.Int32.MaxValue.

although the other argument r开发者_JAVA百科epresented by list of letters hints that just two filtered elements would be enough to perform zip to the function specification.

If we turn to Haskell for implementation comparison the equivalent

 zip ["A","B"] (filter (<2) [0..])

completes without any problem yielding

[("A",0),("B",1)]

As Haskell implementation behavior seem intuitively right what is the justification for the observed behavior of F# Seq.zip implementation?

UPDATE:

I did not notice that Haskell

zip (filter (<2) [0..]) ["A","B"]

does not complete similarly to F#.

Bottom line: It is impossible to implement Zip function capable of zipping sequences of defined and undefined lengths in argument order-agnostic manner. F# Zip implementation simply prefers invariant to argument order behavior over Haskell argument order dependent one.


The reason why it doesn't hang in Haskell is because the implementation of zip just so happens to be stricter in it's first argument than in it's second.

zip :: [a] -> [b] -> [(a,b)]
zip (a:as) (b:bs) = (a,b) : zip as bs
zip _      _      = []

Since the patterns are checked from left to right, this gives the following behavior.

*Main> zip [] undefined
[]
*Main> zip undefined []
*** Exception: Prelude.undefined

Since filter (<2) [0..] is semantically equivalent to 0 : 1 : ⊥, your example is after two iterations

("A", 0) : ("B", 1) : zip [] undefined =
("A", 0) : ("B", 1) : []

If we change the order of the arguments to zip (filter (<2) [0..]) ["A", "B"], we get

(0, "A") : (1, "B") : zip undefined [] =
(0, "A") : (1, "B") : undefined

I don't know much about F#, but I suspect something similar is happening there.

Note that there is no way to define zip such that zip [] undefined and zip undefined [] both return [], since you have to check one of the arguments first, and it's impossible to check against ⊥ due to monotonicity.


I don't know how Haskell does it, and I do agree that it seems intuitively correct (except I wonder what would happen in Haskell if you switched the fixed length list and the indeterminate length list), but I can show you why it works this way in F#. You can see in the F# source code file seq.fs that the significant implementation detail is in IEnumerable.map2:

  let map2 f (e1 : IEnumerator<_>) (e2 : IEnumerator<_>) : IEnumerator<_>=
      upcast 
          {  new MapEnumerator<_>() with
                 member this.DoMoveNext curr = 
                    let n1 = e1.MoveNext()
                    let n2 = e2.MoveNext()
                    if n1 && n2 then
                       curr <- f e1.Current e2.Current
                       true
                    else 
                       false
                 member this.Dispose() = e1.Dispose(); e2.Dispose()
          }

So Seq.zip will try to move both sequences to their third element before deciding whether the zip is complete, thus Seq.initInfinite (fun i -> i) |> Seq.filter ((>) 2) gets stuck trying to find the third element "forever" (until Error: Enumeration based on System.Int32 exceeded System.Int32.MaxValue).


Stephen Swensen has already answered the question.

The solution as of now seems to use Seq.take as you know the length of one of the sequences.

Seq.initInfinite (fun i -> i)
|> Seq.filter ((>) 2)
|> Seq.zip ["A";"B"]
|> Seq.take 2


Based on my reading of the source (https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/seq.fs at about line 900) this is what happens:

The relevant function is in Seq.fs and called revamp2 (this is what is called by Seq.zip)

let revamp2 f (ie1 : seq<_>) (source2 : seq<_>) =
        mkSeq (fun () -> f (ie1.GetEnumerator()) (source2.GetEnumerator()))

Now when we call .MoveNext() on the sequence returned by this it calls MoveNext() on BOTH of the input sequences.

Doing it in this way has made much of the other code simpler but has caused your problem - .MoveNext() will not return for the filtered infinite sequence, but will return for the finite sequence, leading to an infinite loop.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号