Higher order iterators in Ruby?_问答_开发者_运维开发者技术经验分享

I was reading a Ruby q开发者_如何学Cuestion about the .each iterator, and someone stated that using .each can be a code smell if higher order iterators are better suited for the task. What are higher order iterators in Ruby?

edit: Jörg W Mittag, the author of the StackOverflow answer that I was referring to mentioned that he meant to write higher level iterators, but he also explained what they are very well below.

Oops. I meant higher-level iterators, not higher-order. Every iterator is of course by definition higher-order.

Basically, iteration is a very low-level concept. The purpose of programming is to communicate intent to the other stakeholders on the team. "Initializing an empty array, then iterating over another array and adding the current element of this array to the first array if it is divisible by two without a remainder" is not communicating intent. "Selecting all even numbers" is.

In general, you almost never iterate over a collection just for iteration's sake. You either want to

transform each element in some way (that's usually called map, in Ruby and Smalltalk it's collect and in .NET and SQL it's Select),
reduce the whole collection down to some single value, e.g. computing the sum or the average or the standard deviation of a list of football scores (in category theory, that's called a catamorphism, in functional programming it is fold or reduce, in Smalltalk it's inject:into:, in Ruby it's inject and in .NET Aggregate),
filter out all elements that satisfy a certain condition (filter in most functional languages, select in Smalltalk and Ruby, also find_all in Ruby, Where in .NET and SQL),
filter out all elements that do not satisfy a condition (reject in Smalltalk and Ruby)
find the first element that satisfies a condition (find in Ruby)
count the elements thats satisfy a condition (count in Ruby)
check if all elements (all?), at least one element (any?) or no elements (none?) satisfy a condition
group the elements into buckets based on some discriminator (group_by in Ruby, .NET and SQL)
partition the collection into two collections based on some predicate (partition)
sort the collection (sort, sort_by)
combine multiple collections into one (zip)
and so on and so forth …

Almost never is your goal to just iterate over a collection.

In particular, reduce aka. inject aka. fold aka. inject:into: aka. Aggregate aka. catamorphism is your friend. There's a reason why it has such a fancy-sounding mathematical name: it is extremely powerful. In fact, most of what I mentioned above, can be implemented in terms of reduce.

Basically, what reduce does, is it "reduces" the entire collection down to a single value, using some function. You have some sort of accumulator value, and then you take the accumulator value and the first element and feed it into the function. The result of that function then becomes the new accumulator, which you pair up with the second element and feed to the function and so on.

The most obvious example of this is summing a list of numbers:

[4, 8, 15, 16, 23, 42].reduce(0) {|acc, elem|
  acc + elem
}

So, the accumulator starts out as 0, and we pass the first element 4 into the + function. The result is 4, which becomes the new accumulator. Now we pass the next element 8 in and the result is 12. And this continues till the last element and the result is that they were dead the whole time. No, wait, the result is 108.

Ruby actually allows us to take a couple of shortcuts: If the element type is the same as the accumulator type, you can leave out the accumulator and Ruby will simply pass the first element as the first value for the accumulator:

[4, 8, 15, 16, 23, 42].reduce {|acc, elem|
  acc + elem
}

Also, we can use Symbol#to_proc here:

[4, 8, 15, 16, 23, 42].reduce(&:+)

And actually, if you pass reduce a Symbol argument it will treat as the name of the function to use for the reduction operation:

[4, 8, 15, 16, 23, 42].reduce(:+)

However, summing is not all that reduce can do. In fact, I find this example a little dangerous. Everybody I showed this to, immediately understood, "Aah, so that's what a reduce is", but unortunately some also thought that summing numbers is all reduce is, and that's definitely not the case. In fact, reduce is a general method of iteration, by which I mean that reduce can do anything that each can do. In particular, you can store arbitrary state in the accumulator.

For example, I wrote above that reduce reduces the collection down to a single value. But of course that "single value" can be arbitrarily complex. It could, for example, be itself a collection. Or a string:

class Array
  def mystery_method(foo)
    drop(1).reduce("#{first}") {|s, el| s << foo.to_str << el.to_s }
  end
end

This is an example how far you can go with playing tricks with the accumulator. If you try it out, you'll of course recognize it as Array#join:

class Array
  def join(sep=$,)
    drop(1).reduce("#{first}") {|s, el| s << sep.to_str << el.to_s }
  end
end

Note that nowhere in this "loop" do I have to keep track of whether I'm at the last or second-to-last element. Nor is there any conditional in the code. There is no potential for fencepost errors here. If you think about how to implement this with each, you would have to somehow keep track of the index and check whether you are at the last element and then have an if in there, to prevent emitting the separator at the end.

Since I wrote above that all iteration can be done with reduce, I might just as well prove it. Here's Ruby's Enumerable methods, implemented in terms of reduce instead of each as they normally are. (Note that I only just started and have only arrived at g yet.)

module Enumerable
  def all?
    reduce(true) {|res, el| res && yield(el) }
  end

  def any?
    reduce(false) {|res, el| res || yield(el) }
  end

  alias_method :map, def collect
    reduce([]) {|res, el| res << yield(el) }
  end

  def count
    reduce(0) {|res, el| res + 1 if yield el }
  end

  alias_method :find, def detect
    reduce(nil) {|res, el| if yield el then el end unless res }
  end

  def drop(n=1)
    reduce([]) {|res, el| res.tap {|res| res << el unless n -= 1 >= 0 }}
  end

  def drop_while
    reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
  end

  def each
    reduce(nil) {|_, el| yield el }
  end

  def each_with_index
    tap { reduce(-1) {|i, el| (i+1).tap {|i| yield el, i }}}
  end

  alias_method :select, def find_all
    reduce([]) {|res, el| res.tap {|res| res << el if yield el }}
  end

  def grep(pattern)
    reduce([]) {|res, el| res.tap {|res| res << yield(el) if pattern === el }}
  end

  def group_by
    reduce(Hash.new {|hsh, key| hsh[key] = [] }) {|res, el| res.tap {|res|
        res[yield el] = el
    }}
  end

  def include?(obj)
    reduce(false) {|res, el| break true if res || el == obj }
  end

  def reject
    reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
  end
end

[Note: I made some simplifications for the purpose of this post. For example, according to the standard Ruby Enumerable protocol, each is supposed to return self, so you'd have to slap an extra line in there; other methods behave slightly differently, depending on what kind and how many arguments you pass in and so on. I left those out because they distract from the point I am trying to make.]

They're talking about more specialized methods such as map, filter or inject. For example, instead of this:

even_numbers = []
numbers.each {|num| even_numbers << num if num.even?}

You should do this:

even_numbers = numbers.select {|num| num.even?}

It says what you want to do but encapsulates all the irrelevant technical details in the select method. (And incidentally, in Ruby 1.8.7 or later, you can just write even_numbers = numbers.select(&:even?), so even more concise if slightly Perl-like.)

These aren't normally called "higher-order iterators," but whoever wrote that probably just had a minor mental mixup. It's a good principle whatever terminology you use.

From the usual definition of "higer-order" I would say a higher-order iterator is an iterator which takes an iterator as an argument or returns an iterator. So something like enum_for maybe. However I don't think this is what the person meant.

I think the person meant iterators like map or select which are higher-order functions, but did not realize that each is of course also a higher-order function. So basically this is just a case of terminological confusion.

The point of the poster presumably was that you should not use each in cases where map, select or inject could naturally be used instead. And to make that point he used a term, that didn't really make sense in that context.

I get this question a fair bit, so I blogged about the most commonly used iterators: select and reject. In the post there are examples of where 'each' is used incorrectly and how to correct the code to use either 'select' or 'reject'. Anyway, I hope it helps.

http://www.natontesting.com/2011/01/01/rubys-each-select-and-reject-methods/

I just wrote a blog that is very relavent to this question - The reason you want to use higher order functions is that doing so elevates the programmer to a higher level of abstraction, to a point that a problem can be expressed declaratively, pushing the implementation down either to the Ruby standard library or lower level code.

http://www.railstutors.com/blog/declarative-thinking-with-higher-order-functions-and-blocks#.UG5x6fl26jJ