Scenario - 150MB text file which is the exported Inbox of an old email account. Need to parse through and pull out emails from a specific user and writes these to a new, single file. I have code that works, its just dogged slow.
I'm 开发者_JAVA百科using marker strings to search for where to begin/end the copy from the original file.
Here's the main function:
StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt");
string working = string.Empty;
string mystring = string.Empty;
while (!sr.EndOfStream)
{
while ((mystring = sr.ReadLine()) != null)
{
if (mystring == strBeginMarker)
{
writeLog(mystring);
//read the next line
working = sr.ReadLine();
while( !(working.StartsWith(strEndMarker)))
{
writeLog(working);
working = sr.ReadLine();
}
}
}
}
this.Text = "DONE!!";
sr.Close();
The function that writes the selected messages to the new file:
public void writeLog(string sMessage)
{
fw = new System.IO.StreamWriter(path, true);
fw.WriteLine(sMessage);
fw.Flush();
fw.Close();
}
Again, this process works. I get a good output file, it just takes a long time and I'm sure there are ways to make this faster.
The largest optimization would be to change your writeLog method to open the file once at the beginning of this operation, write to it many times, then close it at the end.
Right now, you're opening and closing the file each iteration where you write, which is going to definitely slow things down.
Try the following:
// Open this once at the beginning!
using(fw = new System.IO.StreamWriter(path, true))
{
using(StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt"))
{
string working;
string mystring;
while ((mystring = sr.ReadLine()) != null)
{
if (mystring == strBeginMarker)
{
writeLog(mystring);
//read the next line
working = sr.ReadLine();
while( !(working.StartsWith(strEndMarker)))
{
fw.WriteLine(working);
working = sr.ReadLine();
}
}
}
}
}
this.Text = "DONE!!";
I think you should:
- Open files once.
- Load source file in memory.
- Break it and use several threads for processing.
I would just do a simple parser. Note that this assumes (as you do in your code above) that the markers are in fact unique.
You may have to play with the formatting a bit of your output, but here is the general idea:
// Read the entire file and close it
using (StreamReader sr = new
StreamReader("c:\\Thunderbird_Inbox.txt");)
{
string data = sr.ReadToEnd();
}
string newData = "";
int position = data.IndexOf(strBeginMarker);
while (position > 0)
{
int endPosition = data.IndexOf(endMarker, position);
int markerLength = position + strBeginMarker.Length;
newData += data.Substring(markerLength, endPosition - markerLength);
position = data.IndexOf(strBeginMarker, position+ endStr.Length);
}
writeLog(newData);
(Note that I don't have a 150 MB file to test this on - YMMV depending on the machine you are using).
I do not have a 150MB text file to test, but if your server has the memory would Reading the hold thing into a string and doing a RegEx pulling out the message work?
You could simply declare the StreamWriter object outside of that while
loop and just write the line to it inside the loop.
Like this:
StreamWriter sw = new StreamWriter(path, true);
while
{
// ...
while( !(working.StartsWith(strEndMarker)))
{
sw.WriteLine(working);
working = sr.ReadLine();
}
}
精彩评论