开发者

How to create a parser which tokenizes a list of words taken from a file?

开发者 https://www.devze.com 2023-01-05 06:00 出处:网络
I am trying to do a syntax text corrector for my compilers\' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like \"A valid phrase is SUBJECT VERB A

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".

Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.

I am trying to use Ragel to create a parser, but I don't know how I could do something like:

%%{
  machine test;
  subject = <open-the-subjects-file-and-accept-each-one-of-them>;
  verb = <open-the-verbs-file-and-accept-each-one-of-them>;
  adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
  m开发者_如何学编程ain = subject verb adjective @ { print "Valid phrase!" } ;
}%%

I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.

Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.

Thanks.


If you want to read the files at compile time .. make them be of the format:

subject = \
ruby|\
python|\
c++

then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.


If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.

The action reads the text file and compares the word's contents.

%%{
machine test

action startWord {
    lastWordStart = p;
}
action checkSubject {
   word = input[lastWordStart:p+1]  
   for possible in open('subjects.txt'):
       if possible == word:
           fgoto verb
   # If we get here do whatever ragel does to go to an error or just raise a python exception 
   raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }

subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%


With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

0

精彩评论

暂无评论...
验证码 换一张
取 消