开发者

Enhancing performance of streamwriter in c#

开发者 https://www.devze.com 2023-02-26 18:42 出处:网络
in my program i need to write large text files (~300 mb), the text files contains numbers seperated by spaces, i\'m using this code :

in my program i need to write large text files (~300 mb), the text files contains numbers seperated by spaces, i'm using this code :

TextWriter guessesWriter = TextWriter.Synchronized(new StreamWriter("guesses.txt"));

private void QueueStart()
    {
        while (true)
        {
            if (writeQueue.Count > 0)
            {
                guessesWriter.WriteLine(writeQueue[0]);
                writeQueue.Remove(writeQueue[0]);
            }
        }
    }

private static void Check()
    {
        TextReader tr = new StreamReader("data.txt");

        string guess = tr.ReadLine();
        b = 0;
        List<Thread> threads = new List<Thread>();
        while (guess != null) // Reading each row and analyze it
        {
            string[] guessNumbers = guess.Split(' ');
            List<int> numbers = new List<int>();
            foreach (string s in guessNumbers) // Converting each guess to a list of numbers
                numbers.Add(int.Parse(s));

            threads.Add(new开发者_如何学运维 Thread(GuessCheck));
            threads[b].Start(numbers);
            b++;

            guess = tr.ReadLine();
        }
    }

    private static void GuessCheck(object listNums)
    {
        List<int> numbers = (List<int>) listNums;

        if (!CloseNumbersCheck(numbers))
        {
            writeQueue.Add(numbers[0] + " " + numbers[1] + " " + numbers[2] + " " + numbers[3] + " " + numbers[4] + " " + numbers[5] + " " + numbers[6]);
        }
    }

    private static bool CloseNumbersCheck(List<int> numbers)
    {
        int divideResult = numbers[0]/10;
        for (int i = 1; i < 6; i++)
        {
            if (numbers[i]/10 != divideResult)
                return false;
        }
        return true;
    }

the file data.txt contains data in this format : (dots mean more numbers following the same logic)

1 2 3 4 5 6 1
1 2 3 4 5 6 2
1 2 3 4 5 6 3
.
.
.
1 2 3 4 5 6 8
1 2 3 4 5 7 1
.
.
.

i know this is not very efficient and i was looking for some advice on how to make it quicker. if you night know how to save LARGE amount of numbers more efficiently than a .txt i would appreciate it.


One way to improve efficiency is with a larger buffer on your output stream. You are using the defaults, which give you probably a 1k buffer, but you won't see maximum performance with less than a 64k buffer. Open your file like this:

new StreamWriter("guesses.txt", new UTF8Encoding(false, true), 65536)


Instead of reading and writing line by line (ReadLine and WriteLine), you should read and write big block of data (ReadBlock and Write). This way you will access disk alot less and have a big performance boost. But you will need to manage the end of each line (look at Environment.NewLine).


The effeciency could be improved by using BinaryWriter. Then you could just write out integers directly. This would allow you to skip the parsing step on the read and the ToString conversion on the write.

It also looks like you are creating a bunch of threads in there. Additional threads will slow down your performance. You should do all of the work on a single thread, since threads are very heavyweight objects.

Here is a more-or-less direct conversion of your code to use a BinaryWriter. (This does not address the thread problem.)

    BinaryWriter guessesWriter = new BinaryWriter(new StreamWriter("guesses.dat"));
    private void QueueStart()
    {
        while (true)
        {             
            if (writeQueue.Count > 0)
            {
                lock (guessesWriter)
                {
                    guessesWriter.Write(writeQueue[0]);
                }
                writeQueue.Remove(writeQueue[0]);
            }
        }
    }
    private const int numbersPerThread = 6;
    private static void Check()
    {
        BinaryReader tr = new BinaryReader(new StreamReader("data.txt"));
        b = 0;
        List<Thread> threads = new List<Thread>();
        while (tr.BaseStream.Position < tr.BaseStream.Length)
        {
            List<int> numbers = new List<int>(numbersPerThread);
            for (int index = 0; index < numbersPerThread; index++)
            {
                numbers.Add(tr.ReadInt32());
            }
            threads.Add(new Thread(GuessCheck));
            threads[b].Start(numbers);
            b++;
        }
    }


Try using a bufferi n between. There is a BGufferdSTream. Right now you use very inefficient disc access patterns.

0

精彩评论

暂无评论...
验证码 换一张
取 消