开发者

How to get HTML content text of a Wikipedia Page (via Wikipedia API)? [duplicate]

开发者 https://www.devze.com 2023-03-03 19:40 出处:网络
This question already has answers here: Get Text Content from mediawiki page via API (9 answers) Closed 7 years ago.
This question already has answers here: Get Text Content from mediawiki page via API (9 answers) Closed 7 years ago.

i just want 开发者_Python百科to get content (no link, no categories, no images...just text)


There is no way to get "just the text" from the Wikipedia API. You can either download the HTML of the page (if you do this via index.php rather than api.php, use action=render to avoid downloading all the skin content) or the wikitext (which you can do via the API or by passing action=raw to index.php); you will then have to parse it yourself to remove the bits you don't want to keep.

In the HTML output, MediaWiki is generally good about adding classes to various interface elements you might want to filter out; the templates and such created by users are perhaps less so (e.g. the hack for table sorting just puts some text in a display:none span, no class).

To get the wikitext via the API, use prop=revisions. To get the rendered HTML, use action=parse.

0

精彩评论

暂无评论...
验证码 换一张
取 消