开发者

Extract HTML-like tags with PHP

开发者 https://www.devze.com 2023-01-31 08:14 出处:网络
I\'m trying to extract OUTERMOST special HTML-like tags from a given string. Here\'s a sample string:

I'm trying to extract OUTERMOST special HTML-like tags from a given string. Here's a sample string:

sample string with <::Class id="some id\" and more">text with possible other tags inside<::/Class> some more text

I need to find where in a string a <::Tag starts and where it ends. The problem is it might contain nested tags inside. Is there a simple loop-like algorithm to find the FIRST ocurrence of the <::Tag and the length of the string until the matching <::/Tag>? I've tried a different way, using a simple HTML tag instead and using DomDocument,开发者_Python百科 but it cannot tell me the position of the tag in a string. I cannot use external libraries, i'm just looking for pointers as to how this could be solved. Maybe you've seen an algorithm that does exactly that - i'd like to have a look at it.

Thanks for the help. P.S. regex solutions will not work since there are nested tags. Recursive regex solutions will not work as well. I'm just looking for a very simple parsing algorighm for this specific case.


What you're talking about here is making a template. Regex for parsing templates is very slow. Instead, your template-reading/processing engine should be doing a string parse. It's not super-easy, but it's also not terribly hard. Still, my advice is use another template library instead of reinventing the wheel.

There's an open-source template engine in PHPBB that you could utilize or learn from. Or, use something like Smarty. If performance is a major deal, have a look at Blitz.


strpos + strrpos (ouch...)

$str   = 'sample string with <::Class id="some id" and more">text with possible <::Strong>other<::/Strong> tags inside<::/Class> some more text';
$tag   = '<::';
$first = strpos($str, $tag);
$last  = strrpos($str, $tag);
$rtn   = array();
$cnt   = 0;
while ($first<$last)
{
  if (!$cnt)
  {
    $rtn[] = substr($str, 0, $first);
  }
  ++$cnt;
  $next = strpos($str, $tag, $first+1);

  if ($next)
  {
    $pos   = strpos($str, '>', $first);
    $rtn[] = substr($str, $first, $pos-$first+1);
    $rtn[] = substr($str, $pos+1, $next-$pos-1);
    $first = $next;
  }
}

With the $rtn, you can do whatever you want then ... this code is not perfect yet ...

array (
  0 => 'sample string with ',
  1 => '<::Class id="some id" and more">',
  2 => 'text with possible ',
  3 => '<::Strong>',
  4 => 'other',
  5 => '<::/Strong>',
  6 => ' tags inside',
  7 => '<::/Class> some more text',
)


So basically here's what i came up with. Something like ajreal's solution only not as clean ;] Not even sure if it works perfectly yet, initial testing was successful.

protected function findFirstControl()
{
    $pos = strpos($this->mSource, '<::');

    if ($pos === false)
        return false;

    // get the control name
    $endOfName = false;
    $controlName = '';
    $len = strlen($this->mSource);
    $i = $pos + 3;

    while (!$endOfName && $i < $len)
    {
        $char = $this->mSource[$i];

        if (($char >= 'a' && $char <= 'z') || ($char >= 'A' && $char <= 'Z'))
            $controlName .= $char;
        else
            $endOfName = true;

        $i++;
    }

    if ($controlName == '')
        return false;

    $posOfEnd = strpos($this->mSource, '<::/' . $controlName, $i);
    $posOfStart = strpos($this->mSource, '<::' . $controlName, $i);

    if ($posOfEnd === false)
        return false;

    if ($posOfStart > $pos)
    {
        while ($posOfStart > $pos && $posOfEnd !== false && $posOfStart < $posOfEnd)
        {
            $i = $posOfStart + 1;
            $n = $posOfEnd + 1;
            $posOfStart = strpos($this->mSource, '<::' . $controlName, $i);
            $posOfEnd = strpos($this->mSource, '<::/' . $controlName, $n);
        }
    }

    if ($posOfEnd !== false)
    {
        $ln = $posOfEnd - $pos + strlen($controlName) + 5;
        return array($pos, $ln, $controlName, substr($this->mSource, $pos, $ln));
    }
    else
        return false;
}


Not an extendable solution, but it works.

$startPos = strpos($string, '<::Class');
$endPos = strrpos($string, '<::/Class>');

Note my use of strrpos to fix the nesting problem. Also, this will give you the start position of <::/Class>, not the end.

Why don't you just use regular XML and the DOM? Or just an existing template engine like Smarty?

0

精彩评论

暂无评论...
验证码 换一张
取 消