开发者

Trouble with a 3D array of chars in D

开发者 https://www.devze.com 2023-02-07 07:43 出处:网络
I\'m learning D, and I have a simple program that reads in a text file line by line, separates each line into its distinct words, and prints the whole 开发者_C百科thing to stdout.

I'm learning D, and I have a simple program that reads in a text file line by line, separates each line into its distinct words, and prints the whole 开发者_C百科thing to stdout.

import std.stdio;
import std.string;

void main(string args[])
{
    char[][][] lines;
    auto input = File(args[1], "r");
    foreach(line; input.byLine())
    {
        auto words = split(strip(line));
        lines ~= words;
    }

    foreach(line; lines)
    {
        writeln(line);
    }
}

The code for creating words works. If I just call writeln on words each time it's assigned, I get the output I want. But if I add words to lines and output lines, then strange things happen. lines has an entry for each line in the source file, but each line is a corrupt version of the last line read. For instance, if the last line of the file looks like this:

END    START        * End of routine

I get output that looks something like this:

[       , END, ST, *, End , f rout, ne,    ,     , e other]
[     , END, ST, *, End of, rout, ne,      ,   , e othe]
[    , END, STAR, *, End of, rout, ne.,        
e]
[    , END, START  , *, End of, rout, ne.,        
e]
[END , STAR]
[     , END, START     , *, End , f , out, ne.  ]
[END, START, *, End, of ro, tine. ,  ,   ,  
]
[END, STA, *, o,  r, ut]
[  , END , S, *, End, o,  r, utine.,  ,   ,  , 
,  o]
[END, START    , *, of routi, e.,   ]

Any idea what I'm doing wrong?


You main problem is that byLine is using the same buffer, you need to duplicate it so it doesn't override your data

auto words = split(strip(line).dup);

A more appropriate storage class is string instead of char[], unless you intend on modifying the actual characters. However, you will get a compiler error in v 2.0 because line will be char[]. This is just a matter of duplicating it as an immutable string instead.

auto words = split(strip(line).idup);

This way your program would look like

import std.stdio;
import std.string;

void main(string[] args)
{
    string[][] lines;
    auto input = File(args[1], "r");
    foreach(line; input.byLine())
    {
        auto words = split(strip(line).idup);
        lines ~= words;
    }

    foreach(line; lines)
    {
        writeln(line);
    }
}


The answer to this is twofold.

First, byLine as stated uses an internal buffer (for speed), which gets overwritten on subsequent loop iterations.

Second, look at the operations for words. split(strip(line)). strip only modifies the start and end of the array (which is a reference), and split splits the array into smaller sub-arrays that reference the same underlying data. Neither are destructive; thus, neither need to reallocate. Because of this, the final string[] words still points into the original buffer which gets overwritten on the next itteration.

The solution is to make sure you copy the data if you want it to escape the loop scope, by writing auto words = split(strip(line).dup);. Note that dupping words will not work, as this will only duplicate the array of arrays, not the arrays themselves.

Also, you should use string[] args. The C-like syntax is only supported for legacy reasons and not recommended for use.

0

精彩评论

暂无评论...
验证码 换一张
取 消