I'm working in C# and i got a large text file (75MB) I want to save lines that match a regular expression
I tried reading the file with a streamreader and ReadToEnd, but it takes 400MB of ram
and when used again creates an out of memory exception.
I then tried using File.ReadAllLines():
string[] lines = File.ReadAllLines("file");
StringBuilder specialLines = new StringBuilder();
foreach开发者_高级运维 (string line in lines)
if (match reg exp)
specialLines.append(line);
this is all great but when my function ends the memory taken doesnt clear and I'm left with 300MB of used memory, only when recalling the function and executing the line: string[] lines = File.ReadAllLines("file"); I see the memory clearing down to 50MB give or take and then reallocating back to 200MB
How can I clear this memory or get the lines I need in a different way ?
var file = File.OpenRead("myfile.txt");
var reader = new StreamReader(file);
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
//evaluate the line here.
}
reader.Dispose();
file.Dispose();
You need to stream the text instead of loading the whole file in memory. Here's a way to do it, using an extension method and Linq:
static class ExtensionMethods
{
public static IEnumerable<string> EnumerateLines(this TextReader reader)
{
string line;
while((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
...
var regex = new Regex(..., RegexOptions.Compiled);
using (var reader = new StreamReader(fileName))
{
var specialLines =
reader.EnumerateLines()
.Where(line => regex.IsMatch(line))
.Aggregate(new StringBuilder(),
(sb, line) => sb.AppendLine(line));
}
You can use StreamReader#ReadLine to read file line-by-line and to save those lines that you need.
You should use the Enumerator pattern to keep your memory footprint low in case your file can be huge.
精彩评论