开发者

Using regular expressions in python to determine C++ functions and their parameters

开发者 https://www.devze.com 2023-02-22 12:12 出处:网络
So I\'m doing something wrong in this python script, but it\'s becoming convoluted and I\'m losing sight of what I\'m doing wrong.

So I'm doing something wrong in this python script, but it's becoming convoluted and I'm losing sight of what I'm doing wrong.

I want a script to go through a file, find all the function definitions, and then pull out the name, return type, and parameters of the function, and output a "doxygen" style comment like this:

/******************************************************************************/
  /*!
    \brief
      Main function for the file

    \return
      The exit code for the program
  */
/******************************************************************************/

But I'm doing something wrong with the regular expression in trying to parse the parameters... Here is the script so far:

import re
import sys

f = open(sys.argv[1])

functions = []

for line in f:
  match = re.search(r'([\w]+)\s+([\S]+)\(([\w+\s+\w+])+\)',line)
  if line.find("\\fn") < 0:
    if match:
      returntype = match.group(1)
      funcname = match.group(2)
      print '/********************************************************************'
      print "  \\fn " + match.group()
      print ''
      print '  \\brief'
      print '    Function description for ' + funcname
      print ''
      if len(match.groups()) > 2:
        params = []
        count = len(match.groups()) - 2
        while count > 0:
          matchingstring = match.group(count + 2)
          if matchingstring.find("void") < 0:
            params.append(matchingstring)
          count -= 1
        for parameter in params:
          print "  \\param " + parameter
  开发者_如何学JAVA        print '    Description of ' + parameter
          print ''
      print '  \\return'
      print '    ' + returntype
      print '********************************************************************/'
      print ''

Any help would be appreciated. Thanks


The grammar of C++ is far too complex to be handled by simple regular expressions. You'll need at least a minimal parser. I've found that for restricted cases, where I'm not concerned with C++ in general, but only my own style, I can often get away with a flex based tokenizer and a simple state machine. This will fail in many cases of legal C++—for starters, of course, if someone uses the pre-processor to modify the syntax; but also because < can have different meanings, depending on what precedes it names a template or not. But it's often adequate for a specific job.


I've used a PEG parser with great success when trying to do simple format parsing. pyPeg is a very simple implementation of such a parser written in Python.

Example Python code for C++ function parser:

EDIT: Address template parameters. Tested with input from SK-logic and output is correct.

import pyPEG
from pyPEG import parseLine
import re

def symbol(): return re.compile(r"[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&*][\w:]+")
def type(): return symbol
def functionName(): return symbol
def templatedType(): return symbol, "<", -1, [templatedType, symbol, ","], ">"
def parameter(): return [templatedType, type], symbol
def template(): return "<", -1, [symbol, template], ">"
def function(): return [type, templatedType], functionName, -1, template, "(", -1, [",", parameter], ")" # -1 -> zero or more repetitions.


sourceCode = "std::string foobar(std::vector<int> &A, std::map<std::string, std::vector<std::string> > &B)"
results = parseLine(sourceCode, function(), [], packrat=True)

When this is executed results is:

([(u'type', [(u'symbol', 'std::string')]), (u'functionName', [(u'symbol', 'foobar')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'int')]), (u'symbol', '&A')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::map'), (u'symbol', 'std::string'), (u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'std::string')])]), (u'symbol', '&B')])], '')


C++ cannot really be parsed by a (sane) regular expression: they are a nightmare as soon as nesting is concerned.

There is another concern too, determining when to parse and when not to. A function may be declared:

  • at file scope
  • in a namespace
  • in a class

And the two last can be nested at arbitrary depths.

I would propose to use CLang here. It's a real C++ front-end with a full-featured parser and there are:

  • a C API, with (notably) an API to the Indexing Library
  • Python bindings on top of the C API

The C API and Python bindings are far from fully exposing the underlying C++ model, but for a task as simple as listing functions it should be enough.


That said, I would question the usefulness of the project: if the documentation can be generated by a simple parser, then it is redundant with the code. And redundancy is at best, useless, and worst dangerous: it introduces the potential threat of desynchronization...

If the function is tricky enough that its use requires documentation, then a developer, who knows the limitations and al, has to write this documentation.

0

精彩评论

暂无评论...
验证码 换一张
取 消