I want to use wikipedia dump for my project. The below information is requ开发者_StackOverflow中文版ired for my project.
- For an wikipedia entry, I want to know which other language contain the page?
- I want an downloadable data in csv or other common format.
Is there a way to get this data?
Thanks Bala
The Wikimedia foundation provides XML dumps of all of its projects, including the English language Wikipedia.
Parsing an English-language wiki article for inter-language links is fairly easy: the syntax for such links are [[language_code:Name of other language Wikipedia article]]
, where language_code is usually a two or three letter code (such as tlh
for Klingon), based on an ISO standard except for a few exceptions, such as simple
for Simple English.
Wikimedia provides dumps of Wikipedia in different formats at download.wikimedia.org.
I will answer this question even if it's old because things have changed: now there's Wikidata.
All the interlinks have been eliminated from Wikipedia articles, and now Wikidata hosts them all: you can check an Item (for example, Q42 "Douglas Adams") and the "Wikipedia pages linked to this item" section will provide you with the sitelinks to all different Wikipedias.
Here you can find Wikidata API, or you can use the Special:Export page to retrieve the article(s) in XML.
精彩评论