I need to calculate 3-day correlation. A sample matrix is given below. My problem is that IDs may not be in the universe every day. For example, AAPL may always be in universe but a company - CCL may be in my universe for just 2 days. I would appreciate a vectorized solution. I might have to use structs/accumarray
etc. here as the correlation-matrix size may vary.
% col1 = tradingDates, col2 = companyID_asInts, col3 = VALUE_forCorrelation
rawdata = [ ...
734614 1 0.5;
734614 2 0.4;
734614 3 0.1;
734615 1 0.6;
734615 2 0.4;
734615 3 0.2;
734615 4 0.5;
734615 5 0.12;
734618 1 0.11;
734618 2 0.9;
734618 3 0.2;
734618 4 0.1;
734618 5 0.33;
734618 6 0.55;
734619 2 0.11;
734619 3 0.45;
734619 4 0.1;
734619 5 0.6;
734619 6 0.5;
734620 5 0.1;
734620 6 0.3] ;
'3-day correlation':
% 734614 & 73461开发者_如何学Go5 corr is ignored as this is a 3-day corr
% 734618_corr = corrcoef(IDs 1,2,3 values are used. ID 4,5,6 is ignored) -> 3X3 matrix
% 734619_corr = corrcoef(IDs 2,3,4,5 values are used. ID 1,6 is ignored) -> 3X4 matrix
% 734620_corr = corrcoef(IDs 5,6 values are used. ID 1,2,3,4 is ignored) -> 3X2 matrix
Real data covers Russel1000 universe from 1995-2011 and has over 4.1 million rows. The desired correlation is over a 20-day period.
I wouldn't try and get a vectorized solution here: the MATLAB JIT compiler means that loops can often be just as fast on recent versions of MATLAB.
Your matrix looks a lot like a sparse matrix: does it help to convert it into that form, so that you can use array indexing? This probably only works if the data in the third column can never be 0, otherwise you'll have to keep the current explicit list and use something like this:
dates = unique(rawdata(:, 1));
num_comps = max(rawdata(:, 2));
for d = 1:length(dates) - 2;
days = dates(d:d + 2);
companies = true(1, num_comps);
for curr_day = days'
c = false(1, num_comps);
c(rawdata(rawdata(:, 1) == curr_day, 2)) = true;
companies = companies & c;
end
companies = find(companies);
data = zeros(3, length(companies));
for curr_day = 1:3
for company = 1:length(companies)
data(curr_day, company) = ...
rawdata(rawdata(:, 1) == days(curr_day) & ...
rawdata(:, 2) == companies(company), 3);
end
end
corrcoef(data)
end
精彩评论