开发者

Reading from ifstream won't read whitespace

开发者 https://www.devze.com 2023-03-22 11:39 出处:网络
I\'m implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won\'t read it out. I\'m reading character by character using >>, and all the whitespace is gone.

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.

Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the space开发者_StackOverflows now, but not the new-line character that I need to lex some constructs.

Here's the offending code (extended comments truncated)

while(input >> current) {
    always_next_struct val = always_next_struct(next);
    if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
        continue;
    }
    if (current == L'/') {
        input >> current;
        if (current == L'/') {
            // explicitly empty while loop
            while(input.get(current) && current != L'\n');
            continue;
        }

I'm breaking on the while line and looking at every value of current as it comes in, and \r or \n are definitely not among them- the input just skips to the next line in the input file.


There is a manipulator to disable the whitespace skipping behavior:

stream >> std::noskipws;


The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.

Edit:

Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).

Edit (analyzing code):

After

  while(input.get(current) && current != L'\n');
  continue;

there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?

I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):

//: get.cpp : compile, then run: get < get.cpp

#include <iostream>

int main()
{
  char c;

  while (std::cin.get(c))
  {
    if (c == '/') 
    { 
      char last = c; 
      if (std::cin.get(c) && c == '/')
      {
        // std::cout << "Read to EOL\n";
        while(std::cin.get(c) && c != '\n'); // this comment will be skipped
        // std::cout << "go to next line\n";
        std::cin.putback(c);
        continue;
      }
     else { std::cin.putback(c); c = last; }
    }
    std::cout << c;
  }
  return 0;
}

This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.

If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...


You could open the stream in binary mode:

std::wifstream stream(filename, std::ios::binary);

You'll lose any formatting operations provided my the stream if you do this.

The other option is to read the entire stream into a string and then process the string:

std::wostringstream ss;
ss << filestream.rdbuf();

OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous. EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.


Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.

Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.


You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .

           /*Open the stream in default mode.*/
            std::ifstream myfile("myfile.txt");

            if(myfile.good()) {
                /*Read data using streambuffer iterators.*/
    vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));

                /*str_buf holds all the data including whitespaces and newline .*/
                string str_buf(buf.begin(),buf.end());

                myfile.close();
            } 


By default, this skipws flag is already set on the ifstream object, so we must disable it. The ifstream object has these default flags because of std::basic_ios::init, called on every new ios_base object (more details). Any of the following would work:

in_stream.unsetf(std::ios_base::skipws);
in_stream >> std::noskipws; // Using the extraction operator, same as below
std::noskipws(in_stream); // Explicitly calling noskipws instead of using operator>>

Other flags are listed on cpp reference.


The stream extractors behave the same and skip whitespace.

If you want to read every byte, you can use the unformatted input functions, like stream.get(c).


Why not simply use getline ?

You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)


Just Use getline.

while (getline(input,current))
{
      cout<<current<<"\n";

}


I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.

0

精彩评论

暂无评论...
验证码 换一张
取 消