My goal is to take a file of sentences, apply some basic filtering, and output the remaining sentences to a file and the terminal. I'm using the Hunspell library.
Here's how I get sentences from the file:
public static string[] sentencesFromFile_old(string path)
{
string s = "";
using (StreamReader rdr = File.OpenText(path))
{
s = rdr.ReadToEnd();
}
s = s.Replace(Environment.NewLine, " ");
s = Regex.Replace(s, @"\s+", " ");
s = Regex.Replace(s, @"\s*?(?:\(.*?\)|\[.*?\]|\{.*?\})", String.Empty);
string[] sentences = Regex.Split(s, @"(?<=\. |[!?]+ )");
return sentences;
}
Here's the code that writes to file:
List<string> sentences = new List<string>(Checker.sentencesFromFile_old(path));
StreamWriter w = new StreamWriter(outFile);
foreach(string x in xs)
if(Checker.check(x, speller))
{
w.WriteLine("[{0}]", x);
Console.WriteLine("[{0}]", x);
}
Here's the checker:
public static bool check(string s, NHunspell.Hunspell speller)
{
char[] punctuation = {',', ':', ';', ' ', '.'};
bool upper = false;
// Check the string length.
if(s.Length <= 50 || s.Length > 250)
return false;
// Check if the string contains only allowed punctuation and letters.
// Also disallow words with multiple consecutive caps.
for(int i = 0; i < s.Length; ++i)
{
if(punctuation.Contains(s[i]))
continue;
if(Char.IsUpper(s[i]))
{
if(upper)
return false;
upper = true;
}
else if(Char.IsLower(s[i]))
{
upper = false;
}
else return false;
}
开发者_如何学Go // Spellcheck each word.
string[] words = s.Split(' ');
foreach(string word in words)
if(!speller.Spell(word))
return false;
return true;
}
The sentences are printed on the terminal just fine, but the text file cuts off mid-sentence at 2015 characters. What's up with that?
EDIT: When I remove some parts of the check
method, the file is cut off at various lengths somewhere around either 2000 or 4000. Removing the spellcheck eliminates the cutoff entirely.
You need to flush the stream before closing it.
w.Flush();
w.Close();
The using
statement (which you should also use) will Close the stream automatically, but it will not flush it.
using( var w = new StreamWriter(...) )
{
// Do stuff
w.Flush();
}
Are you closing the StreamWriter
after you are done writing? You could try something like this:
using(StreamWriter w = new StreamWriter(outFile))
{
foreach(string x in xs)
{
if(Checker.check(x, speller))
{
w.WriteLine("[{0}]", x);
Console.WriteLine("[{0}]", x);
}
}
}
The using
statement will close the StreamWriter
(by calling the Dispose
method on the StreamWriter
) after the code inside is done executing.
精彩评论