I've posted similar question here, but it was closed because I didn't explained myself properly. I'll try to explain my problem again.
I managed to write LexicalAnalyzer that will tokenize the following to "public", "class", "A", "{",...,"if","(",...,"}"
string seq = "public class A " +
"{ " +
"public A() " +
"{ " +
"if(1==2) " +
"{ " +
"} " +
"else " +
"{ " +
"} " +
"} " +
"} "
Now, I need to parse it to tree. As I read, it's better to construct the parser in the way that will take rules. Meanwhile I need to write rules for "if" statement" that will be passed to parser and the last will buil开发者_如何学God the parse tree. In future, I'll add rules for "class" and other.
To parse I mean that eventually I will get similar tree like here in the right
My question is how to implement the rules and the parser? Can you direct me or give simple example please?
I've read some posts, but I didn't found something that will help me to do what I need.
P.S. If it's still unclear, please don't close the post, but tell me and I'll change it.
Thank you
Since when is RTFM an answer? Anyways. What you are trying to do is not at all easy, since Java is context free (type-2). Try, for starters writing a syntax-analyser for a language of type-3 (Chomsky Hierarchy). But I'll try to explain to you what you'd need to do anyways.
You would have to define rules for Java which look like that (in my example I'll define a function within a java class, where lowercase letters are terminals and uppercase letters are non-terminals). Terminals cannot be derived any further while non-terminals can.
X -> Y means X derives to Y. X -> Y | Z means X derives either to Y or Z.
f is any name. t is a Type, this would not be a terminal, if I'd try to go all the way, but since I define types to be non-declarable to make my life less of a pain, it's a terminal. '(', ')', '{', '}', ',' and ' ' are terminals. Eps is Epsilon and means nothing.
S -> K t f(T) { N }
T -> t f | t f , T
F -> F, f | f
K -> k K | k
N -> L N | L
L -> f(F);
With this I could parse
final boolean equals(Object obj) {
compare(this, obj);
compare(obj, this);
}
Which would result in:
S -> K t f(T) { N }
with K -> k
-> k t f(T) { N }
with T -> t f
-> k t f(t f) { N }
with N -> L N
-> k t f(t f) { L N }
with L -> f(F);
-> k t f(t f) { f(F); N }
with F -> f, F
-> k t f(t f) { f(f, F); N }
with F -> f
-> k t f(t f) { f(f, f); N }
with N -> L
-> k t f(t f) { f(f, f); L }
with L -> f(F)
-> k t f(t f) { f(f, f); f(F) }
...
-> k t f(t f) { f(f, f); f(f, f); }
-> k (=final) t(=boolean) f(=equals) (t(=Object) f(=obj)) { ... }
Which proves that S defines my simplyfied java (Well it doesn't, but at least I gave an example). So the next thing we have to do is figure out how to get a syntax-tree out of these rules.
Thankfully, this is the easy part, because all you have to do is change the lines to be a tree. So S has children K t f(T) { N }. K has children K and k... Uppercase means a Node has children, Lowercase says it doesn't.
Last problem, you do not start with S but you start with the code already being written. Which leaves you with
K t f(T) { N } -> S
t f -> T
t f , T -> T
F, f -> F
f -> F
k K -> K
k -> K
L N -> N
N -> L
f(F); -> L
Parsing in reverse would look like this:
final boolean equals(Object obj) {
compare(this, obj);
compare(obj, this);
}
final -> k
boolean -> t
equals -> f
Object -> t
obj -> f
compare -> f
this -> f
k t f(t f) { f(f, f); f(f,f); }
with k -> K
K t f(t f) ...
with t f -> T
K t f(T) ...
...
Which would build the tree from down below.
精彩评论