开发者

reversing a regular expression in php

开发者 https://www.devze.com 2023-02-21 20:43 出处:网络
suppose I have this function: function f($string){ $string = preg_re开发者_如何学Pythonplace(\"`\\[.*\\]`U\",\"\",$string);

suppose I have this function:

function f($string){
    $string = preg_re开发者_如何学Pythonplace("`\[.*\]`U","",$string);
    $string = preg_replace('`&(amp;)?#?[a-z0-9]+;`i','-',$string);
    $string = htmlentities($string, ENT_COMPAT, 'utf-8');
    $string = preg_replace( "`&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);`i","\\1", $string );
    $string = preg_replace( array("`[^a-z0-9]`i","`[-]+`") , "-", $string);
    return $string;
}

how can I reverse this function...ie. how should I write the function fReverse() such that we have the following:

$s = f("some string223---");
$reversed = fReverse($s);
echo $s;

and output: some string223---


f is lossy. It is impossible to find an exact reverse. For example, both "some string223---" and "some string223--------" gives the same output (see http://ideone.com/DtGQZ).


Nevertheless, we could find a pre-image of f. The 5 replacements of f are:

  1. Strip everything between [ and ].
  2. Replace entities like <, { and encoded entities like < to a hyphen -.
  3. Escape special HTML characters (<&lt;, &&amp; etc.)
  4. Remove accents of accented characters (&eacute; (=é) → e, etc.)
  5. Turn non-alphanumerics and consecutive hyphens into a single hyphen -.

Out of these, it is possible that 1, 2, 4 and 5 be identity transforms. Therefore, one possible preimage is just reverse step 3:

function fReverse($string) {
   return html_entity_decode($string, ENT_COMPAT, 'utf-8');
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号