I use the MediaWiki API to find images of Wikipedia articles. However, I also get all the useless icons, like the broom for when a article needs to be cleaned up or the creative commons logo that marks something to be placed under a creative commons license.
Is there a way to detect which images are such icons开发者_StackOverflow中文版 so I can drop them? E.g. is there a way to query the size at which the image was embedded (rather then the size of the original image, which might be huge even for icons) so that I can drop all small ones. I'm not really interested in very small images anyway.
As far as I know, no. That information is simply not stored in the database, and is therefore also not available via the API.
Some things you could perhaps do include:
Load the HTML markup of the article (via the API
action=parse
, or simply via index.php withaction=render
) and extract the image sizes from it.Simply build a list of images that should be excluded. You could do this programmatically (e.g. find all images used on all templates included in Category:Wikipedia maintenance templates and all its subcategories) or just add any unwanted images to the exclusion list as you come across them.
精彩评论