Let's define a language:
VAR := [0-9A-Za-z_]+
Exp := VAR
| VAR,'=',VAR
| '(', Exp, ')'
| Exp, '&', Exp
| Exp ,'|', Exp
eg: "( a = b ) & ( c | (d=e) ) "is legal
I've read the YASS & Lex manual, but I'm totally confused , I just want the compiler that can parse this language
Can you tell me how to write the flex&bison configure file for this language?I've done so far:
file a.l:
%{
#include <string.h>
#include "stdlib.h"
#include "stdio.h"
#include "y.tab.h"
%}
%%
("&"|"and"|"AND") { return AND; }
("|"|"or"|"OR") { return OR; }
("="|"eq"|"EQ") { return EQ; }
([A-Za-z0-9_]+) { return VAR;}
("(") { return LB ;}
(")") { return RB ;}
("\n") { return LN ;}
%%
int main(void)
{
yyparse();
return 0;
}
int yywrap(void)
{
return 0;
}
int yyerror(void)
{
printf("Error\n");
exit(1);
}
file a.y
%{
#include <stdio.h>
%}
%token AND OR EQ VAR LB RB LN
%left AND OR
%left EQ
%%
line :
| exp LN{ printf("LN: %s",$1);}
;
exp: VAR { printf("var:%s",$1);}
| VAR EQ VAR { printf("var=:%s %s %s",$1,$2,$3);}
| exp AND exp { printf("and :%s %s %s",开发者_如何学编程$1,$2,$3);}
| exp OR exp { printf("or :%s %s %s",$1,$2,$3);}
| LB exp RB { printf("abstract :%s %s %s",$1,$2,$3);}
;
Now I edited file as Chris Dodd guided,it seems much better(at least the lex worked fine),but I get output like this:
disk_path>myprogram
a=b
var=:(null) (null) (null)LN: (null)ab=b
Error
So, why the function printf output null? and after input the second ,it prompt Error and exit the program?
First write a lex file to tokenize input (and print out what it sees)
You want to introduce the terminals:
[0-9A-Za-z_]+ --> VAR
( --> LPAREN
and) --> RPAREN
& --> AND
| --> OR
= --> EQUAL
and just print out a word for each. For your example
( a = b ) & ( c | (d=e) ) --> LPAREN VAR EQUAL VAR RPAREN AND LPAREN VAR OR LPAREN VAR EQUAL VAR RPAREN RPAREN
This is doable in pure lex. When you do this, update your response and we can talk about the next step
Your lex rule ("[0-9A-Za-z_]+")
will match (only) the literal string [0-9A-Za-z_]+
-- get rid of the "
characters to have it be a pattern to match any identifier or number.
Your yacc code does not match your lex code for punctuation -- the lex code returns AND
for &
while the yacc code is expecting an &
-- so either change the lex code to return '&'
or change the yacc code to use the token AND
, and similarly for |
, (
, and )
. You might also want to ignore spaces in the lex code (rather than treating them as errors). You also have no lex rule to match and return '\n'
, even though you use that in your yacc grammar.
Your yacc code is otherwise correct, but is ambiguous, thus giving you shift/reduce conflicts. That's because your grammar is ambiguous -- an input like a&b|c
can be parsed as either (a&b)|c
or a&(b|c)
. You need to decide how that ambiguity should be resolved and reflect that in your grammar -- either by using more non-terminals, or by using yacc's built-in precedence support for resolving this kind of ambiguity. If you stick the declarations:
%left '|'
%left '&'
in the top of your yacc file, that will resolve the ambiguity by making both &
and |
left associative, and &
higher precedence than |
, which would be the normal interpretation.
Edit
The problem you have now is that you never define YYSTYPE (either directly or with %union) in your .y file and you never set yylval in your .l file. The first problem means that $1
etc are just int
s, not pointers (so it makes no sense to try to print them with %s
-- you should be getting a warning from your C compiler over that). The second problem means that they never have a value anyways, so its just always the default 0 value of an uninitialized global variable
The easiest fix would be to add
%union {
const char *name;
}
%token <name> VAR LB RB LN
%left <name> AND OR
%left <name> EQ
%type <name> expr
to the top of the yacc file. Then change the all the lex rules to be something like
([A-Za-z0-9_]+) { yylval.name = strdup(yytext); return VAR;}
Finally, you also need to change the bison actions for expr to set $$
, eg:
| LB exp RB { asprintf(&$$, "%s %s %s",$1,$2,$3); printf("abstract: %s\n", $$); }
This will at least work, though it will leak lots of memory for the allocated strings.
The last problem you have is that your line
rule only matches a single line, so a second line of input causes an error. You need a recursive rule like:
line: /* empty */
| line exp LN { printf....
精彩评论