Given the following dataset for a single article on my site:
Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60
Article 2
2/1/2010 20000
2/2/2010 250开发者_运维问答00
2/3/2010 23000
where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?
Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?
thanks!
update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.
Velocity is simply the change in a value (delta pageviews) over time:
For article 1 on 2/3/2010:
delta pageviews = 100 + 80 + 60
= 240 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 240 / 3
= 80 pageviews/day
For article 2 on 2/3/2010:
delta pageviews = 20000 + 25000 + 23000
= 68000 pageviews
delta time = 3 days
pageview velocity (over last three days) = [delta pageviews] / [delta time]
= 68,000 / 3
= 22,666 + 2/3 pageviews/day
Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):
relative pageview velocity of article 1 = velocity / MAX_VELOCITY
= 240 / (22,666 + 2/3)
~ 0.0105882353
~ 1.05882353%
relative pageview velocity of article 2 = velocity / MAX_VELOCITY
= (22,666 + 2/3)/(22,666 + 2/3)
= 1
= 100%
"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.
PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")
Explanation: Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:
PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
= -40 / 2
= -20 pageviews per day per day
Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:
PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
= -20 / 1
= -20 pageviews per day per day
PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
= -20 pageviews per day per day
This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.
Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.
- http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed
精彩评论