开发者

C++ Reading in words from a text file, word by word or char by char

开发者 https://www.devze.com 2023-01-16 00:22 出处:网络
I\'ve been googling around and reading through my book and trying to write out code to read through a text file and process words out of it, one by one, so i can put them in alphabetical orde开发者_JS

I've been googling around and reading through my book and trying to write out code to read through a text file and process words out of it, one by one, so i can put them in alphabetical orde开发者_JS百科r and keep a count of how many words where used and much a word was used. I can't seem to get my GetNextWord() function to work properly and it's driving me crazy.

I need to read the words in, one by one, and convert each letter to the lowercase if it is upper case. Which I know how to do that, and have done that successfully. It's just getting the word character by character and putting it into a string that is holding me up.

This is my most recent try at it: Any help would be amazing or a link to a tutorial over how to read from an input file word by word. (Word being alpha characters a-z and ' (don't) ended by whitespace, comma, period, ; , : , ect....

void GetNextWord()
{
    string word = "";
    char c;

    while(inFile.get(c))
    {
        while( c > 64 && c < 123 || c == 39)
        {
            if((isupper(c)))
            {
                c = (tolower(c));
            }
            word = word + c;
        }
        outFile << word;
    }
}


You can read the file word by word by using the >> operator. For example, see this link: http://www.daniweb.com/forums/thread30942.html.

I excerpted their example here:

ifstream in ( "somefile" );
vector<string> words;
string word

if ( !in )
  return;

while ( in>> word )
  words.push_back ( word );


Your logic is wrong. The inner loop runs as long as c doesn't change, and there's nothing in it that would change c.

Why are you having two loops anyway? I think you might be confused about whether that function is supposed to read the next word or all words. Try to separate those concerns, put them into different functions (one of which is calling the other). I find it easiest to approach such problems in a top-down order:

while(inFile.good()) {
  std::string word = GetNextWord(inFile);
  if(!word.empty())
    std::cout << word << std::endl;
}

Now fill in the gaps by defining GetNextWord() to read everything up to the next word boundary.


Personally I like to read in input with std::getline(std::istream&, std::string&) (in the <string> header, but you will of course also need to #include a stream header).

This function breaks on newline, which is whitespace by your problem's definition. But it's not the entire answer to your question. After reading in the line of text, you're going to need to use string operations or standard algorithms to break the string into words. Or you could loop over the string by hand.

The guts would be something like:

std::string buffer;
while (std::getline(std::cin, buffer) {
// break each line into words, according to problem spec
}


I use

// str is a string that holds the line of data from ifs- the text file.
// str holds the words to be split, res the vector to store them in.
while( getline( ifs, str ) ) 
    split(str, res);


void split(const string& str, vector<string>& vec)
{
    typedef unsigned int uint;

    const string::size_type size(str.size());
    uint start(0);
    uint range(0);

 /* Explanation: 
  * Range - Length of the word to be extracted without spaces.
  * start - Start of next word. During initialization, starts at space 0.
  * 
  * Runs until it encounters a ' ', then splits the string with a substr() function,
  * as well as making sure that all characters are lower-case (without wasting time
  * to check if they already are, as I feel a char-by-char check for upper-case takes
  * just as much time as lowering them all anyway.                                       
 */
    for( uint i(0); i < size; ++i )
    {
        if( isspace(str[i]) )
        {
            vec.push_back( toLower(str.substr(start, range + 1)) );
            start = i + 1;
            range = 0;
        } else
            ++range;
    }
    vec.push_back( toLower(str.substr(start, range)) );
}

I'm not sure this is particularly helpful to you, but I'll try. The toLower function is a quick function that simply uses the ::toLower() function. This reads each char until a space, then stuffs it in an vector. I'm not entirely sure what you mean with char by char.

Do you want to extract a word character by a time? Or do you want to check each character as you go along? Or do you mean you want to extract one word, finish, and then come back? If so, I would 1) recommend a vector anyway, and 2) let me know so I can refactor the code.


What's going to terminate your inner loop if c == 'a'? ASCII value for 'a' is 97.

0

精彩评论

暂无评论...
验证码 换一张
取 消