I have a data frame in r :
buys ges dif bin
1 22.34 12 10.34 0
2 55.56 12 43.56 0
3 78.33 12 66.33 0
4 9.99 12 2.01 1
.. .. .. .. ..
dif
is just abs(buys-ges)
and bin
is an ifelse
formula that is 1 if dif is <=10
and 0 otherwise. I'm trying to maximize the sum of the bin
column by changing the ges
column. The constraint is that ges
is the same for all 开发者_开发知识库rows. I've tried a couple packages but can't figure out maximizing or optimizing. Thanks for any suggestions.
buys <- rnorm(1:100)
> buys <- data.frame(a*100)
> buys <- round(abs(a), 2)
> summary(buys)
a...100 gs dif bin
Min. : 0.89 Min. :15 Min. : 1.76 Min. :0.00
1st Qu.: 38.29 1st Qu.:15 1st Qu.: 23.29 1st Qu.:0.00
Median : 72.89 Median :15 Median : 57.88 Median :0.00
Mean : 83.91 Mean :15 Mean : 70.52 Mean :0.13
3rd Qu.:123.50 3rd Qu.:15 3rd Qu.:108.50 3rd Qu.:0.00
Max. :269.11 Max. :15 Max. :254.11 Max. :1.00
> gs1 <- 5
> buys$gs <- gs1
> buys$dif <- abs(buys[,1] - buys$gs)
> buys$bin <- ifelse(buys$dif<=10,1,0)
> colnames(buys) <- c("buys","gs","dif","bin")
> head(buys)
buys gs dif bin
1 7.48 5 2.48 1
2 79.08 5 74.08 0
3 139.22 5 134.22 0
4 41.60 5 36.60 0
5 38.35 5 33.35 0
6 157.72 5 152.72 0
> sum(buys$bin)
[1] 10
> num_buys=function(x)
+ {
+ return(length(buys$buys[buys$buys>=x-10 | buys$buys<=x+10]))
+ }
> ans2 <- optimize(f=num_buys,interval=c(min(buys$buys),max(buys$buys)),maximum=TRUE)
>
>
> ans2
$maximum
[1] 269.1099
$objective
[1] 100
Since values of bin
are either 0 or 1, for a given value of ges
, we're really just counting the number of elements in buys
that are in the interval [ges-10,ges+10]
. Visually, one could imagine "sliding" the interval [ges-10,ges+10]
starting at ges=min(buys)
and ending at ges=max(buys)
and counting the number of entries of buys
that are in the interval as the value of a function. In particular:
num_buys=function(x)
{
return(length(buys[buys>=x-10 & buys<=x+10]))
}
With that, we can use optimize
to find a maximum:
optimize(f=num_buys,interval=c(min(buys),max(buys)),maximum=TRUE)
As an example:
> buys=rnorm(10000,mean=50,sd=10)
> summary(buys)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.38 43.22 50.01 50.06 56.93 92.76
> num_buys=function(x){return(length(buys[buys<=x+10 & buys>=x-10]))}
> optimize(f=num_buys,interval=c(min(buys),max(buys)),maximum=TRUE)
$maximum
[1] 50.16788
$objective
[1] 6808
So, in this case, a maximum value of sum(bin)
would be 6808, and this maximum would occur when ges=50.16788
. Of course, this makes perfect sense, since about 68% of the values should occur within 10 units of 50 (normal distribution and all that). :D
精彩评论