Algorithm For Ranking Items_问答_开发者_运维开发者技术经验分享

I have a list of 6500 items that I would like to trade or invest in. (Not for real money, but for a certain game.) Each item has 5 numbers that will be used to rank it among the others.

Total quantity of item traded per day: The higher this number, the better.

The Donchian Channel of the item over the last 5 days: The higher this number, the better.

The median spread of the price: The lower this number, the better.

The spread of the 20 day moving average for the item: The lower this number, the better.

The spread of the 5 day moving average for the item: The higher this number, th开发者_JS百科e better.

All 5 numbers have the same 'weight', or in other words, they should all affect the final number in the with the same worth or value.

At the moment, I just multiply all 5 numbers for each item, but it doesn't rank the items the way I would them to be ranked. I just want to combine all 5 numbers into a weighted number that I can use to rank all 6500 items, but I'm unsure of how to do this correctly or mathematically.

Note: The total quantity of the item traded per day and the donchian channel are numbers that are much higher then the spreads, which are more of percentage type numbers. This is probably the reason why multiplying them all together didn't work for me; the quantity traded per day and the donchian channel had a much bigger role in the final number.

The reason people are having trouble answering this question is we have no way of comparing two different "attributes". If there were just two attributes, say quantity traded and median price spread, would (20million,50%) be worse or better than (100,1%)? Only you can decide this.

Converting everything into the same size numbers could help, this is what is known as "normalisation". A good way of doing this is the z-score which Prasad mentions. This is a statistical concept, looking at how the quantity varies. You need to make some assumptions about the statistical distributions of your numbers to use this.

Things like spreads are probably normally distributed - shaped like a normal distribution. For these, as Prasad says, take z(spread) = (spread-mean(spreads))/standardDeviation(spreads).

Things like the quantity traded might be a Power law distribution. For these you might want to take the log() before calculating the mean and sd. That is the z score is z(qty) = (log(qty)-mean(log(quantities)))/sd(log(quantities)).

Then just add up the z-score for each attribute.

To do this for each attribute you will need to have an idea of its distribution. You could guess but the best way is plot a graph and have a look. You might also want to plot graphs on log scales. See wikipedia for a long list.

You can replace each attribute-vector x (of length N = 6500) by the z-score of the vector Z(x), where

Z(x) = (x - mean(x))/sd(x).

This would transform them into the same "scale", and then you can add up the Z-scores (with equal weights) to get a final score, and rank the N=6500 items by this total score. If you can find in your problem some other attribute-vector that would be an indicator of "goodness" (say the 10-day return of the security?), then you could fit a regression model of this predicted attribute against these z-scored variables, to figure out the best non-uniform weights.

Start each item with a score of 0. For each of the 5 numbers, sort the list by that number and add each item's ranking in that sorting to its score. Then, just sort the items by the combined score.

You would usually normalize your data entries to their respective range. Since there is no fixed range for them, you'll have to use a sliding range - or, to keep it simpler, normalize them to the daily ranges.

For each day, get all entries for a given type, get the highest and the lowest of them, determine the difference between them. Let Bottom=value of the lowest, Range=difference between highest and lowest. Then you calculate for each entry (value - Bottom)/Range, which will result in something between 0.0 and 1.0. These are the numbers you can continue to work with, then.

Pseudocode (brackets replaced by indentation to make easier to read):

double maxvalues[5]; 
double minvalues[5];
// init arrays with any item
for(i=0; i<5; i++)
   maxvalues[i] = items[0][i]; 
   minvalues[i] = items[0][i]; 
// find minimum and maximum values
foreach (items as item)
   for(i=0; i<5; i++)
       if (minvalues[i] > item[i])
           minvalues[i] = item[i];
       if (maxvalues[i] < item[i])
           maxvalues[i] = item[i];

// now scale them - in this case, to the range of 0 to 1.
double scaledItems[sizeof(items)][5]; 
double t;
foreach(i=0; i<5; i++)
   double delta = maxvalues[i] - minvalues[i];
   foreach(j=sizeof(items)-1; j>=0; --j)
      scaledItems[j][i] = (items[j][i] - minvalues[i]) / delta; 
      // linear normalization

something like that. I'll be more elegant with a good library (STL, boost, whatever you have on the implementation platform), and the normalization should be in a separate function, so you can replace it with other variations like log() as the need arises.

Total quantity of item traded per day: The higher this number, the better. (a)

The Donchian Channel of the item over the last 5 days: The higher this number, the better. (b)

The median spread of the price: The lower this number, the better. (c)

The spread of the 20 day moving average for the item: The lower this number, the better. (d)

The spread of the 5 day moving average for the item: The higher this number, the better. (e)

a + b -c -d + e = "score" (higher score = better score)