I'm thinking if it could be possible to get a site's "characteristic" color. For instance, TechCrunch would be green, ReadWriteWeb would be red, CNN also red, Microsoft blueish, PHP purple, etc...
It doesn't have to be accurate, just a best guess.
Some things I have on my mind:
- parse all css rules and find the one matching the most elements
- parse all css rules and find background colors of the elements having the biggest dimensions
- getting the body element's backgr开发者_开发百科ound image and getting the predominant color of that (is this possible for an image)
- somehow finding the site's "header" (first element in DOM with background css attribute set?) and getting its background
Also I would need a way to eliminate blacks, greys and white.
Is this feasible? Do you have any other ideas?
P.S. Sorry for my English
Feasible, definitely. You can use the wget
tool and some simple regular expressions to parse out CSS colors. You can then collect all those colors and see which one is used most. That will however not always be a good representation of the actual predominant color in a website as it could be possible that several colors occur in many CSS rules but aren't used often.
This is actually a nontrivial project you have here.
My approach would be as follows:
- Download and parse out CSS colors and look for the total count of different colors. If there are only few colors at all, you're more likely to have found the predominant color. It's often the color used for
<a>
tags or<h1>
tags (but not if they're grey or black/white). - When parsing, you should "pool" the colors so that, e.g.
#FFEEEE
is the same as#FFEAEA
, as they're only marginally different. - You need to bring different CSS colors into the same format, e.g.
#FFF
,#FFFFF
,"white"
,rgb(255,255,255)
, and so on. - You need a ruleset for this and a good knowledge of programming
- Finding the predominant colors in images is not so trivial anymore. The simplest approach is for each R, G and B components of every pixel to determine which is the predominant. If your pixel has the values
R(120), G(240), B(80)
, it will most likely be green. Then count this for all the pixels and find the predominant component. - @mu is too short suggested to convert the values into HSV and only extract the hue.
- Another advanced method would include creating a histogram of the three color components and then calculating the area under the histogram.
To sum it up, the task you're defining is worth a thesis, in my opinion :)
Ok, here comes some seriously unorthodox approach:
Use some screen capturing package[1][2] to render the given URL to a Raster Image (like PNG). Analyse the resulting raster image sampling it's pixels for an average, if you're looking for average, or give a threshold to group pixels into "colour-groups". Using the average or max-occurrence of colour groups (which method to use depends on what matters most to you) you can get a pretty high accuracy representation of the predominant colour in the page.
[1] http://cutycapt.sourceforge.net/ [2] http://weblogs.mozillazine.org/roc/archives/2005/05/rendering_web_p.html
What about taking a screenshot and extracting the predominant colors in that image with something like the GD lib?
Using Node.js, Phantomjs and Color-Thief
Dependencies: Node-canvas (which in turn depends on Cairo), Webshot (which depends on Phantomjs), Color-thief, minor dependencies listed on individual package pages.
Webshot is a light wrapper around the headless Webkit Phantomjs.
You can use it to take a screenshot of your page and store it in a stream, sample code below from the project Github.
var webshot = require('webshot');
var fs = require('fs');
webshot('google.com', function(err, renderStream) {
var file = fs.createWriteStream('google.png', {encoding: 'binary'});
renderStream.on('data', function(data) {
file.write(data.toString('binary'), 'binary');
});
});
You can then proceed to pass the image onto Color-thief which will extract the required data for you, see the project samples page for examples.
精彩评论