I wan开发者_如何学JAVAt to use PHP (possibly with Curl/XPath?) to extract data from Wikipedia pages. What would be the best way to go about this? I'll be using CakePHP for this project, although just need to figure out how to get this working first.
You can fetch some data with this PHP function that uses CURL:
http://www.barattalo.it/2010/08/29/php-bot-to-get-wikipedia-definitions/
This has been asked before, see Is there a Wikipedia API? where a few options are listed for interacting with Wikipedia.
You can download snapshots of wikipedia database and handling this into self diskspace. This make by alternative maybe better solution.
Wikipedia database snapshots you can find at: http://dumps.wikimedia.org/
Several options: (Search on google for them)
1. DBPedia
2. Freebase Wikipedia Extracs (WEX)
3. There is Wikipedia link dataset as well
精彩评论