I'm learning D, and I have a simple program that reads in a text file line by line, separates each line into its distinct words, and prints the whole 开发者_C百科thing to stdout.
import std.stdio;
import std.string;
void main(string args[])
{
char[][][] lines;
auto input = File(args[1], "r");
foreach(line; input.byLine())
{
auto words = split(strip(line));
lines ~= words;
}
foreach(line; lines)
{
writeln(line);
}
}
The code for creating words
works. If I just call writeln
on words each time it's assigned, I get the output I want. But if I add words
to lines
and output lines
, then strange things happen. lines
has an entry for each line in the source file, but each line is a corrupt version of the last line read. For instance, if the last line of the file looks like this:
END START * End of routine
I get output that looks something like this:
[ , END, ST, *, End , f rout, ne, , , e other]
[ , END, ST, *, End of, rout, ne, , , e othe]
[ , END, STAR, *, End of, rout, ne.,
e]
[ , END, START , *, End of, rout, ne.,
e]
[END , STAR]
[ , END, START , *, End , f , out, ne. ]
[END, START, *, End, of ro, tine. , , ,
]
[END, STA, *, o, r, ut]
[ , END , S, *, End, o, r, utine., , , ,
, o]
[END, START , *, of routi, e., ]
Any idea what I'm doing wrong?
You main problem is that byLine is using the same buffer, you need to duplicate it so it doesn't override your data
auto words = split(strip(line).dup);
A more appropriate storage class is string instead of char[], unless you intend on modifying the actual characters. However, you will get a compiler error in v 2.0 because line will be char[]. This is just a matter of duplicating it as an immutable string instead.
auto words = split(strip(line).idup);
This way your program would look like
import std.stdio;
import std.string;
void main(string[] args)
{
string[][] lines;
auto input = File(args[1], "r");
foreach(line; input.byLine())
{
auto words = split(strip(line).idup);
lines ~= words;
}
foreach(line; lines)
{
writeln(line);
}
}
The answer to this is twofold.
First, byLine
as stated uses an internal buffer (for speed), which gets overwritten on subsequent loop iterations.
Second, look at the operations for words
. split(strip(line))
. strip
only modifies the start and end of the array (which is a reference), and split splits the array into smaller sub-arrays that reference the same underlying data. Neither are destructive; thus, neither need to reallocate. Because of this, the final string[] words
still points into the original buffer which gets overwritten on the next itteration.
The solution is to make sure you copy the data if you want it to escape the loop scope, by writing auto words = split(strip(line).dup);
. Note that dupping words
will not work, as this will only duplicate the array of arrays, not the arrays themselves.
Also, you should use string[] args
. The C-like syntax is only supported for legacy reasons and not recommended for use.
精彩评论