ANTLR Parser Question_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-20 23:49 出处：网络

I\'m trying to parse a number of text records where elements in a record are separated by a \'+\' char, and where the entire record is terminated by a \'#\' char. For example E1+E2+E3+E4+E5+E6#

相关专题：antlr grammar

I'm trying to parse a number of text records where elements in a record are separated by a '+' char, and where the entire record is terminated by a '#' char. For example E1+E2+E3+E4+E5+E6#

Individual elements can be required or optional. If an element is optional, its value is simply missing. For example, if E2 were missing, the input string would be: E1++E3+E4+E5+E6#.

When dealing with empty trailing elements, however, the separator char ('+') may be missing as well. If, for example, the last 3 elements were missing, the string could be: E1+E2+E3#, but it could also be: E1+E2+E3+++#

I have tried the following rule in Antlr:

'R1' 'E1 + E2 + E3' '+'? 'E4'? '+'? 'E5'? '+'? 'E6'? '#

but Antlr complains that it's ambiguous which of course is correct (every token following E3 could be E4, E5 or E6). The input syntax is fixed (it's from a legacy mainframe system), so I was wondering if anybody has a solution to this problem ?

An alternat开发者_运维知识库ive would be to specify all the different permutations in the rule, but that would be a major task.

Best regards and thanks,

Michael

That task sounds like excessive overkill for ANTLR, any reason you're just not splitting the string into an array using the '+' as a separator?

If it's coming from a mainframe, it most likely was intended to be processed in a trivial way.

e.g.,
C++ : http://www.cplusplus.com/reference/clibrary/cstring/strtok/
PHP : http://us3.php.net/manual/en/function.explode.php
Java: http://java.sun.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29
C# : http://msdn.microsoft.com/en-us/library/system.string.split%28VS.71%29.aspx

Just a thought.

If this is ambiguous, it's likely because your Es all have the same format (a more complicated case would be that your Es all just start with the same k characters where k is your lookahead, but I'm going to assume that's not the case. If it is, this will still work; it will just require an extra step.)

So it looks like you can have up to 6 Es and up to 5 +s. We'll say a "segment" is an optional E followed by a + - you can have 5 segments, and an optional trailing E.

This grammar can be represented roughly like this (imperfect ANTLR syntax since I'm not very familiar with it):

r : (e_opt? PLUS){1,5} e_opt? END
e_opt : E  // whatever your E is
PLUS : '+'
END : '#'

If ANTLR doesn't support anything like {1,5} then this is the same as:

(e_opt? PLUS) ((e_opt? PLUS) ((e_opt? PLUS) ((e_opt? PLUS) (e_opt? PLUS)?)?)?)?

which is not that clean, so maybe there is a nicer way to do it.

ANTLR Parser Question

精彩评论

关注公众号

热门标签

图文推荐

ANTLR Parser Question

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：