开发者

multithreads process data from the same file

开发者 https://www.devze.com 2023-02-01 05:54 出处:网络
can anyone in this for开发者_如何学Pythonum give an example in C how two threads process data from one textfile.

can anyone in this for开发者_如何学Pythonum give an example in C how two threads process data from one textfile.

As an example, I have one textfile that contains a paragraph. I have two threads that will process the data in the said file. One thread will count the number of lines in the paragraph. The second thread will count the numeric characters.

thanks


If you asked in C++ I could give you a code example, but I havent done ANSI C in a very long time so I will give you the design and pseudo code.

Please keep in mind this is really bad pseudo code that is meant to give an example. I'm not questioning WHY you would want to do this. For all I know it could be an excercise with threads or because you "feel like it".

Example 1

int integerCount = 0;
int lineCount = 0;

numericThread()
{
    // By flagging the file as readonly you should
    // be able to open it as many times as you wish
    handle h = openfile ("textfile.txt". readonly);

    while (!eof(h)) {
          String word = readWord (h);
          int outInteger
          if (stringToInteger(word, outInteger)) {
                ++integerCount;
          }
    }
}

lineThread()
{
    // By flagging the file as readonly you should
    // be able to open it as many times as you wish
    handle h = openfile ("textfile.txt". readonly);

    while (!eof(h)) {
          String word = readWord (h);
          if (word.equals("\n") {
                ++lineCount ;
          }
    }
}

If for some reason you aren't able to open the file twice in readonly you will need to maintain a queue for each thread, having the main thread put words into each threads queue. The threads will then pull from the queue.

Example 2

int integerCount = 0;
int lineCount = 0;
queue numericQueue;
queue lineQueue;

numericThread()
{

    while (!numericQueue.closed()) {
          String word = numericQueue.pop();
          int outInteger
          if (stringToInteger(word, outInteger)) {
                ++integerCount;
          }
    }
}

lineThread()
{
    while (!lineQueue.closed()) {
          String word = lineQueue.pop();
          if (word.equals("\n") {
                ++lineCount ;
          }
    }
}

mainThread()
{
    handle h = openfile ("textfile.txt". readonly);
    while (!eof(h)) {
          String word = readWord(h);
          numericQueue.push(word);
          lineQueue.push(word);
    }
    numericQueue.close();
    lineQueue.close();
}


There are lots of ways to do this. You can make different design decisions depending on how fast or simple or elegant or overengineered you want this to be. One way, as posted by Andrew Finnell is to have each thread open the file and read it completely independently. In theory this isn't great because you are doing expensive IO twice but in practice it's probably fine because the OS has likely cached the contents of whichever read executes first. Double IO is still more expensive than average because it involves a lot of needless system calls, but again in practice it will be irrelevant unless you have a very large file.

Another model of how to do this would be for each thread to have an input queue, or a shared global queue. The main thread reads the file and places each line in turn on the queue(s), and perhaps main doubles as one of your worker threads. This is more complicated because access to the queue(s) must be synchronized, or some lockless queue implementation must be used. In the case of a shared global queue, there is less duplication of data but now the lifecycle of that data is more complicated.

Just to point out how many ways such a simple thing can be done, you could go the overengineering route and make each thread generic. Instead of placing data on the queue(s) you place both data (or pointers to data) and function pointers and let each thread execute the callback. This kind of model might might sense if you plan on adding lots more kind of things to compute but want to limit the number of threads you will use.


I don't think you will see much performance difference in using 2 threads over one. Either way, you don't want both threads to read the file. Read the file first, then pass a COPY of the stream to the methods you want and process both. The threads will not have access to the same stream of data at the same time so you'll need to use 2 copies of the textfile.

P.S. It's possible that depending on the size of the file, you will actually loose performance using 2 threads.

0

精彩评论

暂无评论...
验证码 换一张
取 消