I have a lab in the university. But I don't understand how I can do it. I have a grammar of some language(for example, arithmetic expressions grammar). I must build tree of this grammar(I don't know how). Then I must determine, whether input sentence is sentence of this language? How can I do it? P.S. I've read few chapters of Dragon Boo开发者_如何学Gok, but I havn't found anything I need. P.P.S. I can't use lex/flex and yacc/bison.
EDIT Sorry! I shuild be more attentive. Really I have a grammar and I must build tree using this grammar and input sentence(I understand how to do it) if it possible(in another case I must show message about it). Is there any simple for understanding algorithms for it? I can use grammar of any simple language like arithmetical expressions.
I'm assuming this is theoretical since it's homework, and not something you actually need to code at this point. (Kendrick's answer covers the code approach)
The basic idea is to start with your BNF start variable and try to figure out how to expand it, applying rules one at a time, to see if you can come up with your input sequence.
For a ruleset like the following:
(1) start: expression
(2) expression: expression '+' term
(3) | expression '-' term
(4) | term
(5) term: 'a'
(6) | 'b'
Given the expression a + b - a
, you would go something like this:
a. start
(given)
b. expression
(1)
c. expression '-' term
(3)
d. expression '-' 'a'
(5)
e. expression '+' term '-' 'a'
(2)
f. term '+' term '-' 'a'
(4)
g. 'a' '+' term '-' 'a'
(5)
h. 'a' '+' 'b' '-' 'a'
(6)
So that's how you do it one step at a time... Now the trick is, you need to trace out all your rule invocations. So your real tree would look something like this:
start
| (b)
expression
/ | \ (c)
expression '-' term
/ | \ (e) | (d)
expression '+' term 'a'
| (f) | (h)
term 'b'
| (g)
'a'
It's a little complicated at first, but once you actually see how it's done it's not too hard to pick up.
Note: Some people find it easier to work backwards, starting with your input and then applying rules in reverse to try to find your start expression. When you write a parser, you will inevitably need to follow this route on some level.
EDIT: I listed all the expression steps using lowercase letters above, and then showed that each set of branches from the tree actually corresponds with one of the rule applications. May or may not help.
It's been a long time since I've done this, so I'm going to give you the short theoretical answer. I'm sure someone else will have a more in-depth explaination.
Since you haven't said what you've done (and you should as part of your questions in the future), the first step is to tokenize the input.
Once you have the tokens, you can build a state machine to walk the tokens and push them into the desired tree structure. This should be covered in your book (I haven't read it) but you probably want to start by building a model on paper (or electronic) and consider valid inputs from each step. What's a valid left hand value for an equality statement? If you have token x on the tree, where can you go from here?
If you can't determine from the current state where to go next, then you may need to implement look-ahead in your state machine (whether this is required depends on the complexity of your language).
Again, I haven't done this in over 10 years, so my recollection is fuzzy. Hopefully I've given enough of a framework for you to refine your answer or give others who are more knowledgeable on the subject to roast me for being wrong or out of date (and thus arriving at the answer you're looking for as a group).
I'd like to answer this, but there isn't really enough material to work with - I am left with these questions:
- As this is a university lab, isn't there accompanying study material or classes? Also, how much time to do you have?
- What have you tried so far?
- Can you hard-code a given grammar into your program, or does your program need to be able to read any grammar from a file and build a parse-tree from that?
- How complex are the grammars; specifically: are they recursive?
- Are the grammar tokens single-character or multi-character?
Edit: Taking the comment and the updated question into account: I think the most straight-forward approach would be a recursive descent parser, which builds the tree as the input is matched. Look for Niklaus Wirth's "Compiler Construction" book, which has been made available as PDF on the web, and which describes a RDP for a simple language.
The basic idea of a recursive descent parser is that every rule in the grammar corresponds to a function, and each function checks the next token and calls the corresponding next-lower parse function. For example, if your grammar has a rule
expression : constant '+' constant
the associated parsing method would look like this:
void parseExpression() {
parseConstant();
Token token = peek_at_next_token();
if (token == '+') {
parseConstant();
... tree building code here ...
}
else
throw ParseException("Expected '+', found "+token);
}
But see also the other answers to this question.
精彩评论