开发者

MediaWiki API: size at which images where embedded/dropping unrelated icons

开发者 https://www.devze.com 2023-04-05 18:09 出处:网络
I use the MediaWiki API to find images of Wikipedia articles. However, I also get all the useless icons, like the broom for when a article needs to be cleaned up or the creative commons logo that mark

I use the MediaWiki API to find images of Wikipedia articles. However, I also get all the useless icons, like the broom for when a article needs to be cleaned up or the creative commons logo that marks something to be placed under a creative commons license.

Is there a way to detect which images are such icons开发者_StackOverflow中文版 so I can drop them? E.g. is there a way to query the size at which the image was embedded (rather then the size of the original image, which might be huge even for icons) so that I can drop all small ones. I'm not really interested in very small images anyway.


As far as I know, no. That information is simply not stored in the database, and is therefore also not available via the API.

Some things you could perhaps do include:

  • Load the HTML markup of the article (via the API action=parse, or simply via index.php with action=render) and extract the image sizes from it.

  • Simply build a list of images that should be excluded. You could do this programmatically (e.g. find all images used on all templates included in Category:Wikipedia maintenance templates and all its subcategories) or just add any unwanted images to the exclusion list as you come across them.

0

精彩评论

暂无评论...
验证码 换一张
取 消