I've been hacking around with the May09 Oslo bits, experimented with tokenizing some source code. I can't seem to figure out how to correctly handle multiline C-style comments though.
For example: /*comment*/
Some cases that elude me:
/***/
or
/**//**/
I can make one or the other work, but not both. The grammar was:
module Test {
language Comments 开发者_运维百科{
token Comment =
MultiLineComment;
token MultiLineComment =
"/*" MultiLineCommentChar* "*/";
token MultiLineCommentChar =
^ "*" |
"*" PostAsteriskChar;
token PostAsteriskChar =
^ "*" |
"*" ^("*" | "/");
/*
token PostAsteriskChar =
^ "*" |
"*" PostAsteriskChar;
*/
syntax Main = Comment*;
}
}
The commented out token is what I think I want to do, however recursive tokens are not permitted.
The fact that MGrammar itself has "broken" multiline comments (it can't handle /***/
) leads me to believe this isn't possible.
Does anyone know otherwise?
The way I have done it is as follows (not all my own code but I can't find a referance to the original author).
interleave Skippable = Whitespace | Comment;
interleave Comment = CommentToken;
@{Classification["Comment"]}
token CommentToken = CommentDelimited
| CommentLine;
token CommentDelimited = "/*" CommentDelimitedContent* "*/";
token CommentDelimitedContent
= ^('*')
| '*' ^('/');
token CommentLine = "//" CommentLineContent*;
token CommentLineContent
= ^(
'\u000A' // New Line
| '\u000D' // Carriage Return
| '\u0085' // Next Line
| '\u2028' // Line Separator
| '\u2029' // Paragraph Separator
);
This allows for both single line (//
) comments as well as multiline (/* */
) comments.
精彩评论