开发者

How to use Wikipedia API to get the page view statistics of a particular page in wikipedia?

开发者 https://www.devze.com 2023-02-16 23:02 出处:网络
The stats.grok.se tool provides the pageview statistics of a particular page in wikipedia. Is there a method to use the wikipedia api to get the same information? Wha开发者_Python百科t does the page v

The stats.grok.se tool provides the pageview statistics of a particular page in wikipedia. Is there a method to use the wikipedia api to get the same information? Wha开发者_Python百科t does the page views counter property actually mean?


The Pageview API was released a few days ago: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}

  • https://wikimedia.org/api/rest_v1/?doc#/
  • https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageview_API

For example https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Foo/daily/20151010/20151012 will give you

{
  "items": [
    {
      "project": "en.wikipedia",
      "article": "Foo",
      "granularity": "daily",
      "timestamp": "2015101000",
      "access": "all-access",
      "agent": "all-agents",
      "views": 79
    },
    {
      "project": "en.wikipedia",
      "article": "Foo",
      "granularity": "daily",
      "timestamp": "2015101100",
      "access": "all-access",
      "agent": "all-agents",
      "views": 81
    }
  ]
}


No, there is not.

The counter property returned from prop=info would tell you how many times the page was viewed from the server. It is disabled on Wikipedia and other Wikimedia wikis because the aggressive squid/varnish caching means only a tiny fraction of page views would make it to the actual server in order to affect that counter, and even then the increased database write load for updating that counter would probably be prohibitive.

The stats.grok.se tool uses anonymized logs from the cache servers to calculate page views; the raw log files are available from http://dammit.lt/wikistats. If you need an API to access the data from stats.grok.se, you should contact the operator of stats.grok.se to request one be created.


Note this was written 4 years ago, and an API has since been created (see this answer). There's not yet a way to access that via api.php, though.


get the daily JSON for the last 30 days like this

http://stats.grok.se/json/en/latest30/Britney_Spears


You can look into the stats here. Have anyone experienced some API to get the Pageview Stats? Furthermore, I have also looked into the available Raw Data but could not find the solution to extract the Pageview Count.


There doesn't seem to be any API; however, you can make HTTP requests to stats.grok.se and parse the HTML or JSON result to extract the page view counts.

I created a website http://wikipediaviews.org that does exactly that in order to facilitate easier comparison for multiple pages across multiple months and years. To speed things up, and minimize the number of requests to stats.grok.se, I keep all past query results stored locally.

The code I used is available at http://github.com/vipulnaik/wikipediaviews.

The file with the actual retrieval code is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/pageviewqueries.inc

function getpageviewsonline($page, $month, $language)
{
  $url = getpageviewsurl($page,$month,$language);
  $html = file_get_contents($url);
  preg_match('/(?<=\bhas been viewed)\s+\K[^\s]+/',$html,$numberofpageviews);
  return $numberofpageviews[0];
}

The code for getpageviewsurl is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/stringfunctions.inc:

function getpageviewsurl($page,$month,$language)
{
  $page = str_replace(" ","_",$page);
  $page = str_replace("'","%27",$page);
  return "http://stats.grok.se/" . $language . "/" . $month . "/" . $page;
}

PS: In case the link to wikipediaviews.org doesn't work, it's because I registered the domain quite recently. Try http://wikipediaviews.subwiki.org instead in the interim.


em.. this question was asked 6 years ago. There's no such an API in official site in the past.

It changed.

A simple example:

https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageviews&titles=Buckingham+Palace%7CBank+of+England%7CBritish+Museum

See document:

prop=pageviews

Shows per-page pageview data (the number of daily pageviews for each of the last pvipdays days). The result format is page title (with underscores) => date (Ymd) => count.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号