I want to estimate the quantile of some data. The data is so huge that it won't f开发者_高级运维it in memory. And new data keeps coming in. Does anyone know an algorithm to monitor the quantile(s) of the data observed so far with very limited memory and computation? I find P2 algorithm useful. But it does not work very well for my data, which is extremely heavy-tailed distributed.
look into dividing the value space into bins, each bin containing the counts of values in a range.
You can try to make the bins smaller around the point where you expect the looked-for quantile to be.
If You make the number of bins large enough this should work quite well.
精彩评论