开发者

Cannot show the downloaded webpage with proper encoding using PHP

开发者 https://www.devze.com 2022-12-12 18:31 出处:网络
I have to get the content of a persian page and show a part of that page to some users. The problem is after I filter the page content I cannot show the content with the proper encoding. The webpage i

I have to get the content of a persian page and show a part of that page to some users. The problem is after I filter the page content I cannot show the content with the proper encoding. The webpage is located at sena.ir and here is the screen shot of the original webpage part I want to show:

alt text http://img502.imageshack.us/img502/983/original.gif

And here is what I got:

alt text http://www.freeimagehosting.net/uploads/812cebe6b3.gif

Here is the function I use to get the page content:

function getPage($url, $referer="", $timeout="", $header=""){
    if(!isset($timeout))
        $timeout=30;
    $curl = curl_init();
    if(strstr($referer,"://")){
        curl_setopt ($curl, CURLOPT_REFERER, $referer);
    }

    $headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
    $headers [] = 'Connection: Keep-Alive';
    $headers [] = 'Content-type: application/x-www-form-urlencoded;开发者_开发知识库charset=utf-8 '; // I Tried iso-..... as well but no chance
    $user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
    $compression = "gzip";

    curl_setopt ($curl, CURLOPT_HTTPHEADER, $headers );
    curl_setopt ($curl, CURLOPT_HEADER, 0 );
    curl_setopt ($curl, CURLOPT_USERAGENT, $user_agent );
    curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt ($curl, CURLOPT_FOLLOWLOCATION, 1 );
    curl_setopt ($curl, CURLOPT_POST, 0 );
    curl_setopt ($curl, CURLOPT_ENCODING, $compression );
    curl_setopt ($curl, CURLOPT_TIMEOUT, 300 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYHOST, 0 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0 );

    curl_setopt ($curl, CURLOPT_URL, $url);
    $html = curl_exec ($curl);
    curl_close ($curl);
    return $html;
}

$content = getPage("http://sena.ir/");
$p1 = strpos($content,'<TABLE cellSpacing="3" cellPadding="3" width="100%" border="0">');
$p2 = strpos($content,"</TABLE>",$p1);
$content = substr($content, $p1, $p2-$p1);
echo $content;


Data was not the problem. The output was the problem. Since the proxy like function removes the headers of the html and encoding declerations you have to add these lines before you output the filtered data:

<html lang="fa"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
0

精彩评论

暂无评论...
验证码 换一张
取 消