开发者

wikipedia api: get parsed introduction only

开发者 https://www.devze.com 2023-02-18 12:44 出处:网络
Using PHP, is there a nice way to开发者_如何学运维 get the (parsed) introduction only from a wikipedia page?

Using PHP, is there a nice way to开发者_如何学运维 get the (parsed) introduction only from a wikipedia page?

I have to current methods:

  • The first is to call the api page and return, then call the Wiki parser on the introduction I have pulled from the first request (two requests, extracting the intro from the text isn't pretty either).
  • The second is to call the entire page parser and use xpath to retrieve every <p> tag before the contents table.

With both methods I then have to re-parse the HTML to ensure the relevant links inside the introduction link off to wikipedia.

Neither are ideal really, there must be a better way?

  • http://www.mediawiki.org/wiki/API:Parsing_wikitext
  • http://en.wikipedia.org/w/api.php


The action=parse API module accepts a section number parameter, like this. The lead is section number 0.

0

精彩评论

暂无评论...
验证码 换一张
取 消