开发者

Wikipedia: pages across multiple languages

开发者 https://www.devze.com 2023-01-15 20:23 出处:网络
I want to use wikipedia dump for my project. The below information is requ开发者_StackOverflow中文版ired for my project.

I want to use wikipedia dump for my project. The below information is requ开发者_StackOverflow中文版ired for my project.

  1. For an wikipedia entry, I want to know which other language contain the page?
  2. I want an downloadable data in csv or other common format.

Is there a way to get this data?

Thanks Bala


The Wikimedia foundation provides XML dumps of all of its projects, including the English language Wikipedia.

Parsing an English-language wiki article for inter-language links is fairly easy: the syntax for such links are [[language_code:Name of other language Wikipedia article]], where language_code is usually a two or three letter code (such as tlh for Klingon), based on an ISO standard except for a few exceptions, such as simple for Simple English.


Wikimedia provides dumps of Wikipedia in different formats at download.wikimedia.org.


I will answer this question even if it's old because things have changed: now there's Wikidata.

All the interlinks have been eliminated from Wikipedia articles, and now Wikidata hosts them all: you can check an Item (for example, Q42 "Douglas Adams") and the "Wikipedia pages linked to this item" section will provide you with the sitelinks to all different Wikipedias.

Here you can find Wikidata API, or you can use the Special:Export page to retrieve the article(s) in XML.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号