开发者

How do I create a random contingency table in R?

开发者 https://www.devze.com 2023-03-26 12:33 出处:网络
I would like to create random two-way contingency tables, given fixed row and column marginals.Supposing I have a table like this:

I would like to create random two-way contingency tables, given fixed row and column marginals. Supposing I have a table like this:

      A   C   G   T
  A  79   6  13  53
  C  16   7   6  17
  G   9   3   1   6
  T  58  28  18 114

with given row marginals:

  A   C   G   T 
151  46  19 218 

and column marginals:

  A   C   G   T 
162  44  38 190 

I'd like to create a random contingency table, for example:

   A  C  G  T
A 49 16 10 76
C 23  2  6 15
G 11  0  1  7
T 79 26 21 92

which preserves those marginals.

Since n is not too large in this case, I tried to approach this by "untabling" the marginal vectors, i.e. by converti开发者_如何学JAVAng the marginals into vectors of the form

A A A ...C C C ... G G G ... T T T 

and then permuting and tabling them.

My current method for "untabling" the marginals is highly unnatural and inefficient, and I was curious to know if there's a better way. Certain built-in functions must create random contingency tables, for instance chisq.test when simulate.p.value=TRUE. Is random contingency table construction also built in?

Thanks in advance for any suggestions.


I'm not entirely sure what you mean by 'untabling', and since you didn't actually specify the method you're currently using, I can't be sure that this isn't what you're currently doing.

But given marginals of (162, 44, 38, 190) you can 'recreate' the vector just by doing this:

rep(c('A','C','G','T'),times = c(162, 44, 38, 190))

which you can then permute as needed.


I'm sorry, but @joran's answer is not correct. His formula correctly simulates tables with the correct column totals, but the OP requested a simulation that respects both row and column totals. The solution to this was given in 1981 by W.M. Patefield. Algorithm AS159. An efficient method of generation r x c tables given row and column totals. Applied Statistics, 30. 91-97.

Patefield's algorithm is implemented in Base R function r2dtable().

0

精彩评论

暂无评论...
验证码 换一张
取 消