开发者

curl file_get_contents/get_meta_tags encoding

开发者 https://www.devze.com 2023-02-15 08:55 出处:网络
so I\'m using CURL to replace the file_get_contents and get_meta_tags functionality in PHP: <?php class CURL{

so I'm using CURL to replace the file_get_contents and get_meta_tags functionality in PHP:

<?php

class CURL{


    public static function file_get_contents($url){

        $ch = curl_init();

        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

        $data = curl_exec($ch);
        curl_close($ch);

        iconv("Windows-1252","UTF-8",$text);

        return $data;


    }


    public static function get_meta_tags($url){

        $html = self::file_get_contents($url);
        self::get_meta_tags_html($html);



    }

    public static function get_meta_tags_html($html){

        //parsing begins here:
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        //$nodes = $doc->getElementsByTagName('title');

        //get and display what you need:
        //$title = $nodes->item(0)->nodeValue;

        $metas = $doc->getElementsByTagName('meta');

        $return = array();

        for ($i = 0; $i < $metas->length; $i++)
        {
            $meta = $metas->item($i);
            if($meta->getAttribute('name') == 'title')
               $return["title"] = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'description')
                $return['description'] = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'keywords')
                $return['keywords'] = $meta->getAttribute('content');
        }

        return $return;

    }


}


?>

but then when I call CURL::get_meta_tags, on a site that has foreign letters in it such as Japanese, it will return weird characters instead of the Japanese letters whereas if I 开发者_运维百科use the built in php get_meta_tags, it will return the correct character...

how should I modify this code such that CURL::get_meta_tags also return foreign characters properly just like the built in php get_meta_tags


It is more likely that you are just trying to display the text with the wrong encoding.

If you set the character set using the header function it should look correct.

header('Content-Type: text/html; charset=utf-8');

You could check what the character-set is in the meta tag you receive if it was set, and use that.

0

精彩评论

暂无评论...
验证码 换一张
取 消