开发者

3-day rolling correlation calculation in MATLAB

开发者 https://www.devze.com 2023-03-12 04:44 出处:网络
I need to calculate 3-day correlation. A sample matrix is given below. My problem is that IDs may not be in the universe every day.For example, AAPL may always be in universe but a company - CCL may b

I need to calculate 3-day correlation. A sample matrix is given below. My problem is that IDs may not be in the universe every day. For example, AAPL may always be in universe but a company - CCL may be in my universe for just 2 days. I would appreciate a vectorized solution. I might have to use structs/accumarray etc. here as the correlation-matrix size may vary.

% col1 = tradingDates, col2 = companyID_asInts, col3 = VALUE_forCorrelation

rawdata = [ ...

    734614 1 0.5; 
    734614 2 0.4; 
    734614 3 0.1; 

    734615 1 0.6; 
    734615 2 0.4; 
    734615 3 0.2; 
    734615 4 0.5; 
    734615 5 0.12;

    734618 1 0.11; 
    734618 2 0.9; 
    734618 3 0.2; 
    734618 4 0.1; 
    734618 5 0.33;
    734618 6 0.55; 

    734619 2 0.11; 
    734619 3 0.45; 
    734619 4 0.1; 
    734619 5 0.6; 
    734619 6 0.5;

    734620 5 0.1; 
    734620 6 0.3] ; 

'3-day correlation':

% 734614 & 73461开发者_如何学Go5 corr is ignored as this is a 3-day corr

% 734618_corr = corrcoef(IDs 1,2,3 values are used. ID 4,5,6 is ignored) -> 3X3 matrix

% 734619_corr = corrcoef(IDs 2,3,4,5 values are used. ID 1,6 is ignored) -> 3X4 matrix

% 734620_corr = corrcoef(IDs 5,6 values are used. ID 1,2,3,4 is ignored) -> 3X2 matrix

Real data covers Russel1000 universe from 1995-2011 and has over 4.1 million rows. The desired correlation is over a 20-day period.


I wouldn't try and get a vectorized solution here: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.

Your matrix looks a lot like a sparse matrix: does it help to convert it into that form, so that you can use array indexing? This probably only works if the data in the third column can never be 0, otherwise you'll have to keep the current explicit list and use something like this:

dates = unique(rawdata(:, 1));
num_comps = max(rawdata(:, 2));

for d = 1:length(dates) - 2;
    days = dates(d:d + 2);

    companies = true(1, num_comps);
    for curr_day = days'
        c = false(1, num_comps);
        c(rawdata(rawdata(:, 1) == curr_day, 2)) = true;
        companies = companies & c;
    end
    companies = find(companies);

    data = zeros(3, length(companies));
    for curr_day = 1:3
        for company = 1:length(companies)
            data(curr_day, company) = ...
                rawdata(rawdata(:, 1) == days(curr_day) & ...
                        rawdata(:, 2) == companies(company), 3);
        end
    end

    corrcoef(data)
end
0

精彩评论

暂无评论...
验证码 换一张
取 消