开发者

ANTLR or Regex?

开发者 https://www.devze.com 2022-12-22 13:18 出处:网络
I\'m writing a CMS in ASP.NET/C#, and I need to process things like that, every page request: <html>

I'm writing a CMS in ASP.NET/C#, and I need to process things like that, every page request:

<html>
<head>
    <title>[Title]</title>
</head>
<body>
    <form action="[Action]" method="get">
        [TextBox Name="Email", Background=Red]
        [Button Type="Submit"]
    </form>
</body>
</html>

and replace the [...] of course.

My question is how should I implement it, with ANTLR or with Regex? What will be faster? Note, that if I'm implementing it with ANTLR I think that I will need to implement XML, in addon to the [..].

I will need to implement parameters, etc.

EDIT: Please note that my regex can even look like something like that:

public override string ToString()
{
    return Regex.Replace(Input, @"\[
                                    \s*(?<name>\w+)\s*
                                    (?<parameter>
                                        [\s,]*
                                            (?<paramName>\w+)
                                            \s*
                                     开发者_JAVA技巧       =
                                            \s*
                                            (
                                                (?<paramValue>\w+)
                                                |
                                                (""(?<paramValue>[^""]*)"")
                                            )
                                    )*
                               \]", (match) =>
                                  {
                                      ...
                                  }, RegexOptions.IgnorePatternWhitespace);
}        


Whether the correct tool is RegEx or ANTLR or even something else entirely should be heavily dependent on your requirements. The best answer to a "what tool to use" question shouldn't be primarily based on performance, but on the right tool for the job.

RegEx is a text search tool. If all you need to do is pull strings out of strings then it's often the hammer of choice. You'll likely want a tool to help you build your RegEx. I'd recommend Expresso, but there are lots of options out there.

ANTLR is a compiler generator. If you need error messages and parse actions or any of the complicated things that come with a compiler then it's a good option.

What it looks like you're doing is XML search/replace, have you considered XPath? That would be my suggestion.

Choosing the right tool for the job is definitely important, something that should be researched and thought out before development begins. In all cases, it's important to fully understand the program requirements before making any decisions. Do you have a specification for the project? If not, spending the time to come up with one will save you all the time that a poor tool choice can cost you.

Hope that helps!


About the performance of ANTLR vs. RegEx depends on the implementation of RegEx in C#. I know, from experience, that ANTLR is fast enough.

In ANTLR you can ignore certain content, like the XML. You can also seek for the [ and ] and go further with processing.

Both RegEx and ANTLR are supporting your kind of parameters (the "etc." I'm not sure about).

In terms of development speed: RegEx is slightly faster for such a case like this. You can use an online tool to develop the RegEx and see the capture-groups while you edit the RegEx. (Google @ regex gskinner)

Then ANTLR has perfect support for "error-messages": they show line/column numbers and what was wrong. RegEx doesn't have this support.

A general approach for RegEx would be: create a "global scan" RegEx which will find correct [...] groups in your content. Then let the "..." be captuerd by a group, and then apply another RegEx for this smaller content (which splits content based on the equal-sign and commas). This way you have the best runtime performance and it's easy to develop.


If the language you are parsing is regular then regular expressions are certainly an option. If it is not then ANTLR may be your only choice. If I understand these matters correctly XML is not regular.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号