开发者

Extract doctype with simple_html_dom

开发者 https://www.devze.com 2022-12-08 06:34 出处：网络

开发者_StackOverflow社区I am using simple_html_dom to parse a website. Is there a way to extract the doctype?You can use file_get_contents function to get all HTML data from website.

相关专题：doctype php simple-html-dom

开发者_StackOverflow社区

I am using simple_html_dom to parse a website. Is there a way to extract the doctype?

You can use file_get_contents function to get all HTML data from website. For example

<?php
   $html = file_get_contents("http://google.com");
   $html = str_replace("\n","",$html);
   $get_doctype = preg_match_all("/(<!DOCTYPE.+\">)<html/i",$html,$matches);
   $doctype = $matches[1][0];
?>

You can use $html->find('unknown'). This works - at least - in version 1.11 of the simplehtmldom library. I use it as follows:

function get_doctype($doc)
{
    $els = $doc->find('unknown');

    foreach ($els as $e => $el) 
        if ($el->parent()->tag == 'root') 
            return $el;

    return NULL;
}

That's just to handle any other 'unknown' elements which might be found; I'm assuming the first will be the doctype. You can explicitly inspect ->innertext if you want to ensure it starts with '!DOCTYPE ', though.