Boost::Spirit::Qi autorules -- avoiding repeated copying of AST data structures_问答_开发者

Boost::Spirit::Qi autorules -- avoiding repeated copying of AST data structures

开发者 https://www.devze.com 2023-02-03 21:13 出处：网络

I\'ve been using Qi and Karma to do some processing on several small languages. Most of the grammars are pretty small (20-40 rules). I\'ve been able to use autorules almost exclusively, so my parse tr

I've been using Qi and Karma to do some processing on several small languages. Most of the grammars are pretty small (20-40 rules). I've been able to use autorules almost exclusively, so my parse trees consist entirely of variants, structs, and std::vectors.

This setup works great for the common case:

1) parse something (Qi),

2) make minor manipulations to the parse tree (visitor), and

3) output something (Karma).

However, I'm concerned about what will happen if I want to make complex structural changes to a syntax tree, like moving big subtrees around. Consider the following toy example:

A grammar for s-expr-style logical expressions that uses autorules...

// Inside grammar class; rule names match struct names...
pexpr %= pand | por | var | bconst;
pand  %= lit("(and ") >> (pexpr % lit(" ")) >> ")";
por   %= lit("(or ") >>  (pexpr % lit(" ")) >> ")";
pnot  %= lit("(not ") >> pexpr >> ")";

... which leads to parse tree representation that looks like this...

struct var {
   std::string name;
};
struct bconst {
   bool val;
};

struct pand;
struct por;
struct pnot;                               

typedef boost::variant<bconst,
                       var,
                       boost::recur开发者_StackOverflow社区sive_wrapper<pand>,
                       boost::recursive_wrapper<por>,
                       boost::recursive_wrapper<pnot> > pexpr;
struct pand {
   std::vector<pexpr> operands;                    
};

struct por {
   std::vector<pexpr> operands;                    
};

struct pnot {
   pexpr victim;
};

// Many Fusion Macros here

Suppose I have a parse tree that looks something like this:

       pand
      / ... \
   por      por
   / \      / \
 var var   var var

(The ellipsis means 'many more children of similar shape for pand.')

Now, suppose that I want negate each of the por nodes, so that the end result is:

       pand
      / ... \
   pnot     pnot
    |        |
   por      por
   / \      / \
 var var   var var

The direct approach would be, for each por subtree:

- create pnot node

(copies por in construction);

- re-assign the appropriate vector slot in the pand node

(copies pnot node and its por subtree).

Alternatively, I could construct a separate vector, and then replace (swap) the pand vector wholesale, eliminating a second round of copying.

All of this seems cumbersome compared to a pointer-based tree representation, which would allow for the pnot nodes to be inserted without any copying of existing nodes. My question:

Is there a way to avoid copy-heavy tree manipulations with autorule-compliant data structures? Should I bite the bullet and just use non-autorules to build a pointer-based AST (e.g., http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)?

Copying the subtrees shouldn't be as expensive as you assume as the recursive_variant is essentially a wrapper around a shared_ptr. I believe, it's not only the best, but also the easiest solution.