In my recently project i work with multiple rss feeds. I want to list only the latest post from all of开发者_StackOverflow them, and sort them by timestamps.
My issue is that i have about 20 different feeds and the page take 6 seconds to load (only testing with 10 feeds).
What can i do to make it perfrom better?
I use simplexml:
simplexml_load_file($url);
Which i append to an array:
function appendToArray($key, $value){
$this->array[$key] = $value;
}
Just before showing it i make krsort:
krsort($this->array);
Should i cache it somehow?
You could cache them, but you would still have the problem of the page taking ages to load if caches have expired.
You could have a PHP script which runs in the background (e.g. via a cron job) and periodically downloads the feeds you are subscribed to into a database, then you can do much faster fetching/filtering of the data when you want to display it.
Have you done any debugging? Logging microtime
at various points in your code.
You'll find that it's the loading of the RSS feed, rather than parsing it, that takes the time but you might find that this is due to the time each RSS feed takes to generate.
Save those ten feeds as static xml files, point your script at them and see how fast it takes to load.
You can load the RSS feeds in parallel with curl_multi. That could speed up your script, especially if you're using blocking calls at the moment.
A small example (from http://www.rustyrazorblade.com/2008/02/curl_multi_exec/) :
$nodes = array('http://www.google.com', 'http://www.microsoft.com', 'http://www.rustyrazorblade.com');
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$url =$nodes[$i];
$curl_arr[$i] = curl_init($url);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master,$running);
} while($running > 0);
echo "results: ";
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
echo( $i . "\n" . $results . "\n");
}
echo 'done';
More info can be found at Asynchronous/parallel HTTP requests using PHP multi_curl and How to use curl_multi() without blocking (amongst others).
BTW To process the feeds after they are loaded using curl_multi you will have to use simplexml_load_string instead of simplexml_load_file of course.
yes of course caching is the only sensible solution.
better to set up a cron job to retrieve these feeds and store the data locally.
精彩评论