开发者

Deriving a state machine from a BNF grammar

开发者 https://www.devze.com 2022-12-10 11:04 出处:网络
I am trying to put together a proof of concept of an XSS-safe string interpolation scheme. Given a string with substitutions,

I am trying to put together a proof of concept of an XSS-safe string interpolation scheme.

Given a string with substitutions,

"Hello <b>$planetoid</b>!"

I want break it into literal portions and substitutions ("Hello<b>" planetoid "</b>!") and then run a state machine left to right over the literal portions. When I reach an interpolated value (planetoid in the above), I need to be able to get from the state to an appropriate escaping function.

Does anyone know of any examples of how to use lex/yacc/bison to derive a state machine and be able to associate labels in the grammar with output states? I want to derive a state machine that I can use both in javascript, and to try and replace PHP's underlying string implementation.

My reasons for doing this are describe开发者_运维百科d here.

cheers, mike


In general, it is not possible to create a state machine for grammars that can be represented in BNF. State machines can only recognize regular languages and BNF can specify context-free languages. Yacc can create parsers. Would that be sufficient?


It looks like I can put markers in the grammar, so if I use two different production types, one with no side effect, and that consumes characters, and one that consumes no characters, but updates a state variable

ST_EXPECT_TAG_NAME : { state = TAG_NAME };
TAG_BODY
    : '<' ST_EXPECT_TAG_NAME TAG_NAME ATTRS SPACES '>' ST_OUT_OF_TAG
    ;

the compiled output associates state names in a switch statement

YY_REDUCE_PRINT (yyn);
switch (yyn)
  {
      case 118:
#line 74 "tmp/html-combo.y"
    { state = TAG_NAME ;}
    break;

There may be a way to extract the tables without parsing C, but I'm so ignorant of yacc/bison.


You can use yacc/bison for this. Initially looking at bison its hard to tell where you can implement a state machine. Rules in bison are resolved left-right-up. Ie; If you have a rule (called rule0) which derives rule1 rule2 rule3, the actions are called in this order: rule1,rule2,rule3,rule0. You can combine that using a global state machine, with dynamic return values for rules (I use a union with different types like a string, an int or even a container for return values).

0

精彩评论

暂无评论...
验证码 换一张
取 消