开发者

Avoid grabbing nothing from string stream

开发者 https://www.devze.com 2023-03-09 21:16 出处:网络
I\'m working on an assembler for a very basic ISA. Currently I\'m implementing parser function and I\'m using a string stream to grab words from lines. Here\'s an example of the assembly code:

I'm working on an assembler for a very basic ISA. Currently I'm implementing parser function and I'm using a string stream to grab words from lines. Here's an example of the assembly code:

; This program counts from 10 to 0
        .ORIG x3000
        LEA R0, TEN     ; This instruction will be loaded into memory location x3000
        LDW R1, R0, #0
START   ADD R1, R1, #-1
        BRZ DONE
        BR  START
                        ; blank line
DONE    TRAP    x25     ; The last executable instruction
TEN     .FILL   x000A   ; This is 10 in 2's comp, hexadecimal
        .END

Don't worry about the nature of the assembly code, simply look at line 3, the one with the comment to the right. My parser functions aren't complete, but here's what I have:

// Define three conditions to code
enum {DONE, OK, EMPTY_LINE};
// Tuple containing a condition and a string vector
typedef tuple<int,vector<string>> Code;

// Passed an alias to a string
// Parses the line passed to it
Code ReadAndParse(string& line)
{

    /***********************************************/
    /****************REMOVE COMMENTS****************/
    /***********************************************/
    // Sentinel to flag down position of first
    // semicolon and the index position itself
    bool found = false;
    size_t semicolonIndex = -1;

    // Conve开发者_Go百科rt the line to lowercase
    for(int i = 0; i < line.length(); i++)
    {
        line[i] = tolower(line[i]);

        // Find first semicolon
        if(line[i] == ';' && !found)
        {
            semicolonIndex = i;
            // Throw the flag
            found = true;
        }
    }

    // Erase anything to and from semicolon to ignore comments
    if(found != false)
        line.erase(semicolonIndex);


    /***********************************************/
    /*****TEST AND SEE IF THERE'S ANYTHING LEFT*****/
    /***********************************************/

    // To snatch and store words
    Code code;
    string token;
    stringstream ss(line);
    vector<string> words;

    // While the string stream is still of use
    while(ss.good())
    {
        // Send the next string to the token
        ss >> token;
        // Push it onto the words vector
        words.push_back(token);

        // If all we got was nothing, it's an empty line
        if(token == "")
        {
            code = make_tuple(EMPTY_LINE, words);
            return code;
        }
    }

    /***********************************************/
    /***********DETERMINE OUR TYPE OF CODE**********/
    /***********************************************/


    // At this point it should be fine
    code = make_tuple(OK, words);
    return code;
}

As you can see, the Code tuple contains a condition represented in the enum decleration and vector containing all words in the line. What I want is to have every word in a line pushed into the vector and then returned.

The issue arises on the third call of the function (the third line of the assembly code). I use the ss.good() function to determine if I have any words in the string stream. For some reason the ss.good() function returns true even though there is no fourth word in the third line and I end up having the words [lea] [r0,] [ten] and [ten] pushed into the vector. ss.good() is true on the fourth call and token receives nothing, thus I have [ten] pushed into the vector twice.

I notice if I remove the spaces between the semicolon and the last word, this error doesn't occur. I want to know how to get the right number of words pushed into the vector.

Please don't recommend Boost library. I love the library, but I want to keep this project simple. This is nothing big, there's only a dozen instructions for this processor. Also, bear in mind that this function is only half-baked, I'm testing and debugging it incrementally.


The stream's error flags only get set after the condition (such as reaching the end of the stream) has occurred.

Try replacing your loop condition with:

while(ss >> token)
{
    // Push it onto the words vector
    words.push_back(token);

    // If all we got was nothing, it's an empty line
    if(token == "")
    {
        code = make_tuple(EMPTY_LINE, words);
        return code;
    }
}

With this code, I get the following tokens for line 3:

"LEA"
"R0,"
"TEN"
";"
"This"
"instruction"
"will"
"be"
"loaded"
"into"
"memory"
"location"
"x3000"

I know the language you're trying to parse is a simple one. Nonetheless you would do yourself a favour if you would consider using a specialized tool for the job such as, for example, flex.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号