开发者

Regular Expressions to find C++ elements?

开发者 https://www.devze.com 2023-01-24 09:32 出处:网络
I\'m looking for some predefined Regexes for elements of ANSI C++. I would like to create a program which takes a headerfile (with includes, namespaces, classes etc) as input and returns lists with t

I'm looking for some predefined Regexes for elements of ANSI C++.

I would like to create a program which takes a headerfile (with includes, namespaces, classes etc) as input and returns lists with the found classnames, methods, attributes etc.

Its hard to google for something like that, I always end up with tutorials of how to开发者_Go百科 use Regexes in C++. Perhaps I'm just googling the wrong terms? Perhaps someone already has found/used/created such Regexes.


This type of operation is not possible to do with a regular expression. C++ is not a regular language and hence can't be reliably parsed with a regular expression. The safest approach here is to use an actual parser here to locate C++ elements.

If 100% correctness is not a goal though then a regular expression will work because it can be crafted to catch the majority of cases within a code base. The simplest example would be the following

class\s+[a-z]\w+

However it will incorrectly match the following as a class

  • Forward declarations
  • Any string literal with text like "class foo"
  • Template parameters
  • etc ...


You might find the code for ctags handy. It will parse code and break out the symbols for use in emacs and other programs. In fact, it might just do all the work you are trying to do yourself.


You may also find something interesting in ctags or cscope as already mentioned. I also have encountered flist here


I'm writing a Python program to extract some essential class info from a large messy C++ source tree. I'm having pretty good luck with using regexes. Fortunately, nearly all the code follows a style that lets me get away with defining just a few regexes to detect class declarations, methods, etc. Most member variables have names like "itsSomething_" or "m_something". I kludge in hard-coded hackwork to catch anything not fitting the style.

class_decl_re   = re.compile(  r"^class +(\w+)\s*(:|\{)"  )
close_decl_re   = re.compile(  r"^\};"  )
method_decl_re  = re.compile(  r"(\w[ a-zA-Z_0-9\*\<\>]+) +(\w+)\("    ) 
var_decl1_re    = re.compile(  r"(\w[ a-zA-Z_0-9\*\<\>]+) +(its\w+);"  )
var_decl2_re    = re.compile(  r"(\w[ a-zA-Z_0-9\*\<\>]+) +(m_\w+);"  )
comment_pair_re = re.compile(  r"/\*.*\*/" )

This is a work in progress, but I'll show this (possibly buggy) (no, almost certainly buggy) snip of code to show how the regexes are used:

# at this point, we're looking at one line from a .hpp file
# from inside a class declaration.  All initial whitespace has been 
# stripped.  All // and /*...*/ comments have been removed.
    is_static = (line[0:6]=="static")
    if is_static:
        line=line[6:]

    is_virtual = (line[0:7]=="virtual")
    if is_virtual:
        line=line[7:]

    # I believe "virtual static" is impossible, but if our goal
    # is to detect such coding gaffes, this code can't do it.

    mm = method_decl_re.match(line)
    vm1 = var_decl1_re.match(line)
    vm2 = var_decl2_re.match(line)
    if mm:
        meth_name = mm.group(2)
        minfo = MethodInfo(meth_name, classinfo.name)  # class to hold info about a method
        minfo.rettype = mm.group(1)              # return type
        minfo.is_static = is_static
        if is_static:
            if is_virtual:
                classinfo.screwed_up=True
            classinfo.class_methods[meth_name] = minfo
        else:
            minfo.is_polymorphic = is_virtual    
            classinfo.obj_methods[meth_name] = minfo

    elif vm1 or vm2:
        if vm1: # deal with vars named "itsXxxxx..."
            vm=vm1   
            var_name = vm.group(2)[3:]
            if var_name.endswith("_"):
                var_name=var_name[:-1]
        else:  # deal with vars named "m_Xxxxx..."
            vm=vm2   
            var_name = vm.group(2)[2:]  # remove the m_
        datatype = vm.group(1)
        vi = VarInfo(var_name, datatype)
        vi.is_static = is_static 
        classinfo.vars[var_name] = vi

I hope this is easy to understand and translate to other languages, at least for a starting point for anyone crazy enough to try. Use at your own risk.

0

精彩评论

暂无评论...
验证码 换一张
取 消