开发者

Parse WikiPedia Introduction PHP

开发者 https://www.devze.com 2023-03-02 02:38 出处:网络
I have read through other questions on this site - using the example answer given here - wikipedia api: get parsed introduction only

I have read through other questions on this site - using the example answer given here -

wikipedia api: get parsed introduction only

I have got to the stage where i get the first section of a wikipedia article back. But the first section includes pictures aswell as text. All i want is the text. here is the outputted html from my cURL response

 $ Array
(
[parse] => Array
    (
        [text] => Array
            (
                [*] => <div class="dablink">This article is about sports known as    football.  For the ball used in these sports, see <a href="/wiki/Football_(ball)">Football  (ball)</a>.</div> 
   <div class="thumb tright"> 
   <div class="thumbinner" style="width:227px;"><a href="/wiki/File:Football4.png"   class="image"><img alt=""    src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Football4.png/225px-   Football4.png" width="225" height="274" class="thumbimage" /></a> 
   <div class="thumbcaption"> 
   <div class="magnify"><a href="/wiki/File:Football4.png" class="internal"  title="Enlarge"><img src="http://bits.wikimedia.org/skins-1.17/common/images/magnify- clip.png" width="15" height="11" alt="" /></a></div> 
   Some of the many different games known as football. From top left to bottom right:      <a href="/wiki/Association_football">Association football</a> or soccer, <a   href="/wiki/Australian_rules_football">Australian rules football</a>, <a  href="/wiki/International_rules_football">International rules football</a>, <a  href="/wiki/Rugby_Union" class="mw-redirect" title="Rugby Union">Rugby Union</a>, <a  href="/wiki/Rugby_League" class="mw-redirect" title="Rugby League">Rugby League</a>, and <a  href="/wiki/American_Football" class="mw-redirect" title="American Football">American   Football</a>.</div> 
  </div> 
  </div> 
  <p>The game of <b>football</b> is any of several similar <a href="/wiki/Team_sport"  title="Team sport">team sports</a>, of similar origins which involve advancing a ball into   a goal area in an attempt to score. Many of these involve <a href="/wiki/Kick_(football)"  title="Kick (football)">kicking</a> a ball with the foot to score a <a  href="/wiki/Goal_(sport)" title="Goal (sport)">goal</a>, though not all codes of football  using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is <a href="/wiki/Association_football">association football</a>,   more commonly known as just "football" or "soccer". Unqualified, the word <i><a  href="/wiki/Football_(word)" title="Football (word)">football</a></i> applies to whichever  form of football is the most popular in the regional context in which the word appears,  including <a href="/wiki/American_footb开发者_运维百科all">American football</a>, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a  href="/wiki/Canadian_football">Canadian football</a>, <a  href="/wiki/Gaelic_football">Gaelic football</a>, <a href="/wiki/Rugby_league">rugby  league</a>, <a href="/wiki/Rugby_union">rugby union</a> and other related games. These variations are known as "codes".</p> 
    <div class="toclimit-3"></div> 

The code i actually want is located in the paragraph tags if thats any use? (starts with the words - "the game of"

My url link that grabs the data in php is this -

 'http://en.wikipedia.org/w/api.php?action=parse&page='.$search.'&redirects=1&format=json&prop=text&section=0'

Example code that i have tried -

 <?php

 include_once('simple_html_dom.php');

 $html = file_get_html('http://amazon.co.uk/');

 foreach($html->find('p') as $element)   
 {
 echo $element->plaintext . '<br>';
 }

 ?>

This unfortunately returns a blank page


Just download the Simple HTML DOM parser

And then use this:

include_once('simple_html_dom.php');

$html = file_get_html('http://en.wikipedia.org/wiki/Football');

foreach($html->find('p') as $element)   
{
    echo $element->plaintext . '<br>';
    break;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消