开发者

C# Streamreader writer (memory issues)

开发者 https://www.devze.com 2022-12-15 16:03 出处:网络
I have a few multimillion lined text files located in a directory, I want to read line by line and replace “|” with “\\”and then write out the line to a new file.This code might work just fine but

I have a few multimillion lined text files located in a directory, I want to read line by line and replace “|” with “\” and then write out the line to a new file. This code might work just fine but I’m not seeing any resulting text file, or it might be I’m just be impatient.

{
        string startingdir = @"K:\qload";
        string dest = @"K:\D\ho\jlg\load\dest";
        string[] files = Directory.GetFiles(startingdir, "*.txt");

        foreach (string file in files)
        {
            StringBuilder sb = new StringBuilder();开发者_C百科
            using (FileStream fs = new FileStream(file, FileMode.Open))
            using (StreamReader rdr = new StreamReader(fs))
            {
                while (!rdr.EndOfStream)
                {
                    string begdocfile = rdr.ReadLine();
                    string replacementwork = docfile.Replace("|", "\\");
                    sb.AppendLine(replacementwork);
                    FileInfo file_info = new FileInfo(file);
                    string outputfilename = file_info.Name;
                    using (FileStream fs2 = new FileStream(dest + outputfilename, FileMode.Append))
                    using (StreamWriter writer = new StreamWriter(fs2))
                    {
                        writer.WriteLine(replacementwork);
                    }
                }
            }
        }
    }

DUHHHHH Thanks to everyone.

Id10t error.


Get rid of the StringBuilder, and do not reopen the output file for each line:

string startingdir = @"K:\qload";
string dest = @"K:\D\ho\jlg\load\dest";
string[] files = Directory.GetFiles(startingdir, "*.txt");

foreach (string file in files)
{
    var outfile = Path.Combine(dest, Path.GetFileName(file));

    using (StreamReader reader = new StreamReader(file))
    using (StreamWriter writer = new StreamWriter(outfile))
    {
        string line = reader.ReadLine();
        while (line != null)
        {
            writer.WriteLine(line.Replace("|", "\\"));
            line = reader.ReadLine();
        }
    }
}


Why are you using a StringBuilder - you are just filling up your memory without doing anything with it.

You should also move the FileStream and StreamWriter using statements to outside of your loop - you are re-creating your output streams for every line, causing unneeded IO in the form of opening and closing the file.


Use Path.Combine(dest, outputfilename), from your code it looks like you're writing to the file K:\D\ho\jlg\load\destouputfilename.txt


This code might work just fine but I’m not seeing any resulting text file, or it might be I’m just be impatient.

Have you considered having a Console.WriteLine in there to check the progress. Sure, it's going to slow down performance a tiny tiny bit - but you'll know what's going on.


It looks like you might want to do a Path.Combine, so that instead of new FileStream(dest + outputfilename), you have new FileStream(Path.Combine(dest + outputfilename)), which will create the files in the directory that you expect, rather than creating them in K:\D\ho\jlg\load.

However, I'm not sure why you're writing to a StringBuilder that you're not using, or why you're opening and closing the file stream and stream writer on each line that you're writing, is that to force the writer to flush it's output? If so, it might be easier to just flush the writer/stream on each write.


you're opening and closing the output strean for each line in the output, you'll have to be very patient! open it once outside the loop.


I guess the problem is here:

string begdocfile = rdr.ReadLine();
string replacementwork = docfile.Replace("|", "\\");

you're reading into begdocfile variable but replacing chars in docfile which I guess is empty


string replacementwork = docfile.Replace("|", "\\");

I believe the above line in your code is incorrect : it should be "begdocfile.Replace ..." ?

I suggest you focus on getting as much of the declaration and "name manufacture" out of the inner loop as possible : right now you are creating new FileInfo objects, and path names for every single line you read in every file : that's got to be hugely expensive.

  1. make a single pass over the list of target files first, and create, at one time, the destination files, perhaps store them in a List for easy access, later. Or a Dictionary where "string" will be the new file path associated with that FileInfo ? Another strategy : just copy the whole directory once, and then operate to directly change the copied files : then rename them, rename the directory, whatever.

  2. move every variable declaration out of that inner loop, and within the using code blocks you can.

  3. I suspect you are going to hear from someone here at more of a "guru level" shortly who might suggest a different strategy based on a more profound knowledge of streams than I have, but that's a guess.

Good luck !

0

精彩评论

暂无评论...
验证码 换一张
取 消