开发者

Parallel distinct word count in Go

开发者 https://www.devze.com 2023-03-14 09:26 出处:网络
Jakob Østergaard presented this challenge: Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.

Jakob Østergaard presented this challenge:

Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.

H开发者_开发百科ow can we meet this challenge with parallel programming (preferably in Go, but a description in English will suffice)?


There are several possibilities, but I guess you mean "efficiently" ?

The general idea would be to split the text into manageable chunks, pile those chunks into a queue, and have multiple consumers handle the chunks.

This looks like a typical Map/Reduce application to me:

          _ Worker_
         /          \
        /            \
Splitter--- Worker ---Aggregator
        \            /
         \_ Worker _/

Ideally the "multiple" queues would be a single one with multiple consumers, so that if one worker slows down the whole process does not slows as much.

I would also use a signal from the Splitter to the Workers to let them know the input has been fully consumed and they can start send their results to the Aggregator.


Here's the solution, in Go, to Jakob Østergaard's problem.

/*
    The problem: Write a program that reads text from 
    standard-input, and returns (prints) the total 
    number of distinct words found in the text. 

    C versus C++, Jakob Østergaard, February 24th, 2004
    http://unthought.net/c++/c_vs_c++.html
*/

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

func main() {
    rdr := bufio.NewReader(os.Stdin)
    words := make(map[string]bool, 1024*1024)
    word := make([]int, 0, 256)
    for {
        r, n, _ := rdr.ReadRune()
        if unicode.IsSpace(r) || n == 0 {
            if len(word) > 0 {
                words[string(word)] = true
                word = word[:0]
            }
            if n == 0 {
                break
            }
        } else {
            word = append(word, r)
        }
    }
    fmt.Println(len(words))
}

It's naive to add the phrase "parallel programming" to this and most other problems and expect some magical improvement. Reading input sequentially from a stream and performing trivial computation provides no significant opportunities for parallel computing.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号