Given df
below, I want to get the time between requests, and then get a textual output of a histogram of probabilities that a request will come between 1 second apart, 2 seconds apart, 3 seconds apart, etc.. until 10 seconds. I want to use all of the data when calculating the probabilities, but I only want to see the first 10 seconds of data.
I've tried to get help with this on the ML, but could not. I've received great help on here, so I hope I'm not abusing the help. This should be my last question. Thanks a lot.
df <- read.csv(textConnection('
"SOURCE","REQUEST_DATE"
"A","09/11/2011 09:28:48"
"A","09/11/2011 09:28:47"
"A","09/11/2011 09:15:42"
"A","09/11/2011 09:15:41"
"D","09/13/2011 09:06:53"
"D","09/13/2011 09:06:52"
"D","09/13/2011 08:56:55"
"D","09/13/2011 08:56:52"
"D","09/13/2011 0开发者_运维技巧8:55:43"
"D","09/13/2011 08:39:07"
'), stringsAsFactors=FALSE)
And here's how I'm getting the diff, with the excellent help of Andrie:
df_diff <- ddply(df, .(SOURCE), summarize, TIME_DIFF=-unclass(diff(REQUEST_DATE)))
So, I want something like the following (with made up results)
A 1 55%
A 2 15%
A 3 10%
...
A 10 5%
D 1 10%
D 2 12%
D 3 15%
...
D 10 1%
D 5013 2%
, for example, would get cut off, because I only want the top 10 for each source.
The "histogram as text" part is confusing me, but I am guessing you actually want to tabulate within one second breaks:
df_diff$tdiff_grp <- cut(df_diff$TIME_DIFF, 0:10, right=FALSE)
with(df_diff, tapply(tdiff_grp, SOURCE, table))
$A
[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10)
0 2 0 0 0 0 0 0 0 0
$D
[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10)
0 1 0 1 0 0 0 0 0 0
After you clarify what is actually desired, it would be a simple matter to use either prop.table or divide these by their sums (and then multiply by 100) to produce percentages.
EDIT: A simple function can return percentages:
> tbls <- with(df_diff, tapply(tdiff_grp, SOURCE,table))
> lapply(tbls, function(x) 100*x/sum(x) )
$A
[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10)
0 100 0 0 0 0 0 0 0 0
$D
[0,1) [1,2) [2,3) [3,4) [4,5) [5,6) [6,7) [7,8) [8,9) [9,10)
0 50 0 50 0 0 0 0 0 0
精彩评论