What would be a good way to parse a C-like or Lisp-like code into an array, using C#?
So for example, for a little snippet like the following:
if (number > 50) {
alert('Hello, World!');
}
I want to be able to store every word and symbol into an array.
But up until now I managed to output an array like the following:
[0] if
[1] (number
[2] >
[3] 50)
[4] {
[5] alert('Hello,
[6] World!');
[7] }
Yo开发者_开发技巧u see at array location 1
, where it says (number
? That's not really what I want. I want even that little parenthesis to be placed into its own array location.
What I was initially thinking on doing was to read every character of the code, and then start storing them into arrays accordingly. But that seems like I'm reinventing the wheel when parsing strings. Are there any simpler way of doing this?
p.s. I'm doing this because I want to learn proper string manipulation.
There are many rules to parsing C language, and you can't simply tokenize the code with whitespace characters.
You need to have a notion of symbols. Tokens . , - + / * -> ( ) = == != < > <= >= << >> ; ? : " ' & && | || ~
(and so on) are all symbols. If during parsing you stumble upon one of those then treat it as separate token, regardless of what character comes next. After " and ' disregard this rule, until you come to another "/', unless if it's after escape character \
. And there are comment handling, and trigraphs, and macros handling, and many more things to be aware of.
Read about fslex and fsyacc. It might be a good starting point to learn about abstract syntax trees, lexers and parsers.
Also F# lexers and parsers written with fslex and fsyacc are easy to use in .NET application.
You could try to set up a parser in a way that you first check if the text is a kind of "something", and then tokenize it accordingly.
For a book describing this very thing, please take a look at the "Structure and Interpretation of Computer Programs" (also known as SICP) book available online which is used by many universities world wide. You can find an example of the eval function they use as a starting point.
精彩评论