OCaml lex: doesn't work at all, whatsoever_问答_开发者

I am at the end of my rope here. I cannot get anything to work in ocamllex, and it is driving me nuts. This is my .mll file:

{

open Parser

}

rule next = parse
  | (['a'-'z'] ['a'-'z']*) as id { Identifier id }
  | '=' { EqualsSign }
  | ';' { Semicolon }
  | '\n' | ' ' { next lexbuf }
  | eof { EOF }

Here are the contents of the file I pass in as input:

a=b;

Yet, when I compile and run the thing, I get an error on the very first character, saying it's not valid. I honestly have no idea what's going on, and Google has not helped me at all. How can this even be possible? As you can see, I'm really stumped here.

EDIT:

I was working for so long that I gave up on the parser. Now this is the relevant code in my main file:

let parse_file filename =
  let l = Lexing.from_channel (op开发者_高级运维en_in filename) in
    try
      Lexer.next l; ()
    with
      | Failure msg ->
        printf "line: %d, col: %d\n" l.lex_curr_p.pos_lnum l.lex_curr_p.pos_cnum

Prints out "line: 1, col: 1".

Without the corresponding ocamlyacc parser, nobody will be able to find the issue with your code since your lexer works perfectly fine!

I have taken the liberty of writing the following tiny parser (parser.mly) that constructs a list of identifier pairs, e.g. input "a=b;" should give the singleton list [("a", "b")].

%{%}

%token <string> Identifier
%token EqualsSign
%token Semicolon
%token EOF

%start start
%type <(string * string) list> start

%%

start:
| EOF {[]}
| Identifier EqualsSign Identifier Semicolon start {($1, $3) :: $5}
;

%%

To test whether the parser does what I promised, we create another file (main.ml) that parses the string "a=b;" and prints the result.

let print_list = List.iter (fun (a, b) -> Printf.printf "%s = %s;\n" a b)
let () = print_list (Parser.start Lexer.next (Lexing.from_string "a=b;"))

The code should compile (e.g. ocamlbuild main.byte) without any complaints and the program should output "a=b;" as promised.

In response to the latest edit:

In general, I don't believe that catching standard library exceptions that are meant to indicate failure or misuse (like Invalid_argument or Failure) is a good idea. The reason is that they are used ubiquitously throughout the library such that you usually cannot tell which function raised the exception and why it did so.

Furthermore, you are throwing away the only useful information: the error message! The error message should tell you what the source of the problem is (my best guess is an IO-related issue). Thus, you should either print the error message or let the exception propagate to the toplevel. Personally, I prefer the latter option.

However, you probably still want to deal with syntactically ill-formed inputs in a graceful manner. For this, you can define a new exception in the lexer and add a default case that catches invalid tokens.

{
  exception Unexpected_token
}
...
| _ {raise Unexpected_token}

Now, you can catch the newly defined exception in your main file and, unlike before, the exception is specific to syntactically invalid inputs. Consequently, you know both the source and the cause of the exception giving you the chance to do something far more meaningful than before.

A fairly random OCaml development hint: If you compile the program with debug information enabled, setting the environment variable OCAMLRUNPARAM to "b" (e.g. export OCAMLRUNPARAM=b) enables stack traces for uncaught exceptions!

btw. ocamllex also can do the + operator for 'one or more' in regular expressions, so this

['a'-'z']+

is equivalent to your

['a'-'z']['a'-'z']*

I was just struggling with the same thing (which is how I found this question), only to finally realize that I had mistakenly specified the path to input file as Sys.argv.(0) instead of Sys.argv.(1)! LOLs

I really hope it helps! :)

It looks like you have a space in the regular expression for identifiers. This could keep the lexer from recognizing a=b, although it should still recognize a = b ;