开发者

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)

开发者 https://www.devze.com 2023-03-04 16:11 出处:网络
I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution.

I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution. I need a formula that will allow me to select a range around any value of this variable such that an equal (or close to it) amount of data points fall within that range.

This will allow me to then analyze all of the data points within a specific range and to treat them 开发者_开发问答as "similar situations to the input."

From what I understand, this means that I need to convert it from arbitrary distribution to uniform distribution. I have read (but barely understood) that what I am looking for is called "probability integral transform."

Can anyone assist me with some code (Matlab preferred, but it doesn't really matter) to help me accomplish this?


Here's something I put together quickly. It's not polished and not perfect, but it does what you want to do.

clear
randList=[randn(1e4,1);2*randn(1e4,1)+5];
[xCdf,xList]=ksdensity(randList,'npoints',5e3,'function','cdf');
xRange=getInterval(5,xList,xCdf,0.1); 

and the function getInterval is

function out=getInterval(yPoint,xList,xCdf,areaFraction)
    yCdf=interp1(xList,xCdf,yPoint);
    yCdfRange=[-areaFraction/2, areaFraction/2]+yCdf;

    out=interp1(xCdf,xList,yCdfRange);

Explanation:

The CDF of the random distribution is shown below by the line in blue. You provide a point (here 5 in the input to getInterval) about which you want a range that gives you 10% of the area (input 0.1 to getInterval). The chosen point is marked by the red cross and the interval is marked by the lines in green. You can get the corresponding points from the original list that lie within this interval as

newList=randList(randList>=xRange(1) & randList<=xRange(2));

You'll find that on an average, the number of points in this example is ~2000, which is 10% of numel(randList)

numel(newList)

ans =

        2045

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)

NOTE:

  • Please note that this was done quickly and I haven't made any checks to see if the chosen point is outside the range or if yCdfRange falls outside [0 1], in which case interp1 will return a NaN. This is fairly straightforward to implement, and I'll leave that to you.
  • Also, ksdensity is very CPU intensive. I wouldn't recommend increasing npoints to more than 1e4. I assume you're only working with a fixed list (i.e., you have a list of 5e5 points that you've obtained somehow and now you're just running tests/analyzing it). In that case, you can run ksdensity once and save the result.


I do not speak Matlab, but you need to find quantiles in your data. This is Mathematica code which would do this:

In[88]:= data = RandomVariate[SkewNormalDistribution[0, 1, 2], 10^4];

Compute quantile points:

In[91]:= q10 = Quantile[data, Range[0, 10]/10];

Now form pairs of consecutive quantiles:

In[92]:= intervals = Partition[q10, 2, 1];

In[93]:= intervals

Out[93]= {{-1.397, -0.136989}, {-0.136989, 0.123689}, {0.123689, 
  0.312232}, {0.312232, 0.478551}, {0.478551, 0.652482}, {0.652482, 
  0.829642}, {0.829642, 1.02801}, {1.02801, 1.27609}, {1.27609, 
  1.6237}, {1.6237, 4.04219}}

Verify that the splitting points separate data nearly evenly:

In[94]:= Table[Count[data, x_ /; i[[1]] <= x < i[[2]]], {i, intervals}]

Out[94]= {999, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000}
0

精彩评论

暂无评论...
验证码 换一张
取 消