I am using the PHP lib Simple HTML Dom Parser, as suggested here ( How do you parse and process HTML/XML in PHP? ) to parse a webpage's html content.
To create the DOM, I have to do:
$html = file_get_html('http://www.example.com/');
The problem is that if I do:
$html = file_get_html('www.example.com');
without specifying the URL's protocol, I will get an error.
My question is: How can I get to know if the URL with the protocol is "http://www.example.com/" or "https://www开发者_运维知识库.example.com/" having in hands only the string "www.example.com"?
I can't figure out something smarter than assuming "http://" as default and, if it fails, try "https://"
if (!$html = file_get_html('http://' . $url)) $html = file_get_html('https://' . $url);
There is no way to know because both could be valid. I would assume http://
though because normal practice is to redirect http to https if it is required, and file_get_html
should follow an HTTP 301 or 302 redirect.
You could try to use get_headers() on the http address and look for the Upgrade: request in the header. If you get a valid response, use http. Otherwise, try on https.
精彩评论