Replace in a list in Erlang_问答_开发者_运维开发者技术经验分享

I'm trying to learn some Erlang and having trouble figuring out the best approach for a particular problem. Here's what I'm doing:

I have a big 2-D array of data (list of lists) whose values are calculated by independent processes. The processes send a message back to an aggregator with the result when they're done calculatin开发者_如何学Pythong. My problem is how to get the aggregated data into a new big array. I could do something like this:

aggregator(Array) ->
    receive
        {Value1_1, {1, 1}} ->
            Value1_1 = Value1_1
    end,
    receive
        {Value1_2, {1, 2}} ->
            Value1_2 = Value1_2
    end,
    % ...
    {{Value1_1, Value2_1},
     {Value2_1, Value2_2}}

but that's terribly lame and scales poorly. What I think I want to do is more like this:

aggregator(Array) when is_full(Array) -> %% I think I know how to write is_full.
    Array;
aggregator(Array) ->
    receive
        {Value, {X, Y}} ->
            aggregator(replace_2d(Array, X, Y)) %% Is this the way to go?
    end.

but I don't see any built-in facilities for replacing values based on an array index in the built-ins. This leads me to suspect that I'm missing some idiomatic way of achieving my goals that would be more obvious to someone experienced with functional programming. Am I? Is my approach all wrong? Should I just write myself a replace_2d function and call it good?

At end is an explanation of how I am wrong.

Arrays are implemented so that when functional semantics allow, they are updated in O(1) time. That is, if it only is referenced once and the reference is dropped as soon as the new array is computed, the new array will be computed by overwriting the changed entry. As this is all handled by the compiler/interpreter, you still have the safety of functional programming.

Because Erlang arrays are only one-dimensional, you need to do the index computation yourself.. Perhaps having one aggregator per row would be reasonable and help when running on many processors?

Example code assumes 0 is Origo for Row and Column and that every position will be received exactly once.

aggregator(Rows, Cols)->
  Size = Rows*Cols,
  aggregator(Cols, array:new(Size), Size).

aggregator(Cols, Array, 0)->Array;
aggregator(Cols, Array, N)->
  receive
    {Value, {Col, Row}}->
      aggregator(Cols, array:set(Col+Row*Cols, Value, Array), N-1)
  end.

I have tested the fixed array structure by running:

test_array(Size, Times)->
  test_array(Size, array:new(Size, {default, 0}), Times).
test_array(_Size, Array, 0)->
  Array;
test_array(Size, Array, Times) ->
  X = random:uniform(Size)-1,
  V = array:get(X, Array),
  test_array(Size, array:set(X, V+1, Array), Times-1).

with Times=100000000 and Size at [10, 100, 1000, 10000, 100000]. Wallclock times in seconds were: [101,129,140,182,241] This is very roughly about O(log(Size)) where I would have expected about O(1). Thus I have showed I was wrong, but perhaps in the future array will behave more like I thought it did.

Do you have to work with the incomplete list while it's being built up - i.e. the elements have to be at the correct positions?

If you really have to, binary trees might help to make index-based updates more efficient,

but if not, just collect the values as they come in and sort afterwards.

There is an array module that may have the semantics you are looking for. But it is a functional container and may not have the performance characteristics you are looking for, esp. it you are dealing with a large array.

However, your first attempt may have some merit, just do it recursively. (If you know the list sizes going in.)

Completely untested, but start with somthing like this:

aggregator(X, Y) ->
    aggregator(X, Y, []).

aggregator(_, 0, Accum) ->
    Accum;
aggregator(X, Y, Accum) ->
    aggregator(X, Y-1, [aggr_x(X, Y, [])|Accum]).

aggr_x(0, _, AccX) ->
    AccX;
aggr_x(X, Y AccX) ->
    receive
      {Value, {X, Y}} ->
        aggr_x(X-1, Y, [Value|AccX]) ->
    end.

Keep in mind that this will receive high indexed elements first, and that for large data sets, your inbox may get very full, and you will need to consider the performance of matching on a receive with a deep inbox.