Calculate estimated mathematical expectation of ranking data?_问答_开发者

I have to sort a "most popular apps RSS" by app download 开发者_开发知识库count. Here is the problem:

Suppose there are 1,000 apps.

The RSS data lists top 100 apps for each app category.

The RSS data also provides top 100 app list regardless of category.

RSS is sorted by each app's download count, but exact download count number is unknown.

Each app has two known properties: category, and its position on the RSS ranking.

Now I want to sort all of the 1,000 apps by its estimated download count.

The sorting do not need to be very accurate, just statistically speaking most possible would be OK.

How could I implement this sorting algorithm? TIA.

you can process this way : (I assume each app belongs to only one category)

Let say you have the following ranking for each category C1 ..C10

C1             C2    ...             C10

app1-1       app2-1

app1-2       &pp2-2

..           ...

app1-100     app2-100              app10-100

and

the overall 100 top apps classment (for example):

C  app1-1  app1-2  app2 -1  ... app2-10

Now using these 2 tables, first you need to order C1 to C10 in the same order as app1-1 to app10-1 appear in list C, so you "know" (it's more like a guess) what Category is the more important in term of ranking.

Then use this information to sort the rest .

Now I'm gona use a simpler example to show how to order the rest of the elements.

let's take 3 categories and 12 apps .

C1      C2      C3

app1    app21   app31   

app2    app22   app32

app3    app23   app33

app4    app24   app34

and C = app1 app2 app21 app31

1.first mark all elemnt in C in the table :

app1    app21 ->app31   
 |   /
app2    app22   app32

app3    app23   app33

app4    app24   app34

2.second, sort the remaining elements

Since you don't have more information, a good approximation would be to look at each line from left to right (from larger ranked top list to smaller ranked top list) which gives :

app3 app22 app32 app4 app23 app33 app24 app34

then the overall classement would be :

app1 app2 app21 app31 app3 app22 app32 app4 app23 app33 app24 app34

I hope this example makes my ideas clear and that it helps.

I think this approach uses all the information you have in C1 ...C10 and C.

A simple way is to use the overall top-100 to determine which category to get the next app from.

In pesudo-code:

While (not finished) 
  i++
  category = Overall_list(i).getCategory()
  Overall_list.add(get next app from list for category)
end while

Any category(s) that have no entry in the overall top-100 will be added last.

Build a directed graph as follows:

Each app is a node.
If app X ranks above app Y on any of the lists, put an edge pointing from X to Y. Note: you really only need to add an edge if X is one rank higher than Y on any list.
It might be possible that some of the #1-ranked items on the category lists don't appear on the total ranking list. In this case, I would add edges pointing from the lowest-ranked item on the total list to each of these items to make the graph connected.

Then, do a topological sort on the constructed graph. The resulting ordering will be guaranteed to be compatible with each individual top 100 list.

This approach will work even if an app appears in more than one category list -- assuming that the category lists are mutually consistent (e.g.: ranked according to total downloads and not, say, category downloads). For example, if you ever have a case where X is ranked above Y on one list but Y is ranked above X on another list, then this won't really work (and I'm not sure what would).

Without more information (e.g.: some kind of probability model), I can't really interpret what "statistically speaking most possible" really means.