开发者

Extract doctype with simple_html_dom

开发者 https://www.devze.com 2022-12-08 06:34 出处:网络
开发者_StackOverflow社区I am using simple_html_dom to parse a website. Is there a way to extract the doctype?You can use file_get_contents function to get all HTML data from website.
开发者_StackOverflow社区

I am using simple_html_dom to parse a website. Is there a way to extract the doctype?


You can use file_get_contents function to get all HTML data from website. For example

<?php
   $html = file_get_contents("http://google.com");
   $html = str_replace("\n","",$html);
   $get_doctype = preg_match_all("/(<!DOCTYPE.+\">)<html/i",$html,$matches);
   $doctype = $matches[1][0];
?>


You can use $html->find('unknown'). This works - at least - in version 1.11 of the simplehtmldom library. I use it as follows:

function get_doctype($doc)
{
    $els = $doc->find('unknown');

    foreach ($els as $e => $el) 
        if ($el->parent()->tag == 'root') 
            return $el;

    return NULL;
}

That's just to handle any other 'unknown' elements which might be found; I'm assuming the first will be the doctype. You can explicitly inspect ->innertext if you want to ensure it starts with '!DOCTYPE ', though.

0

精彩评论

暂无评论...
验证码 换一张
取 消