开发者

Parsing Wiki API content

开发者 https://www.devze.com 2023-03-16 06:27 出处:网络
I have this wiki from the API http://fr.wikipedia.org/w/api.php?action=query&titles=%C9rythropo%EF%E9tine&prop=revisions&rvprop=content&format=xmlfm

I have this wiki from the API http://fr.wikipedia.org/w/api.php?action=query&titles=%C9rythropo%EF%E9tine&prop=revisions&rvprop=content&format=xmlfm

which I would like to retrieve the main content starting from:

L''''érythropoïétine''' ('''EPO''') es开发者_开发问答t une [[hormone]] ......etc

I tried for a start to preg_replace everything from the top starting from the word "{{Chimiebox..." to the bottom "}}" using this

preg_replace( '/^{{(.*)}}$/sim', '', $value[0]['*'] );

But kind of doesn't work..does anyone know of a good way to determine the start of the content?? Thanks for any advice.


Well, afaik the most projects use the Wikipedia Parser directly, e.g. the Wikipedia Offline Client Project at my university. Since you seem to be using php, this may the be the easiest way for you.

0

精彩评论

暂无评论...
验证码 换一张
取 消