开发者

Binning different lengths in R

开发者 https://www.devze.com 2023-03-25 00:56 出处:网络
input1 dput(a1  100 200 + a1  250 270 + a1  333 340 - a2  450 460 +) input2 dput(a1  101 106 + a1  112 117 +

input1

dput(a1  100 200 +
a1  250 270 +
a1  333 340 -
a2  450 460 +)

input2

dput(a1  101 106 +
a1  112 117 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  260 262 +
a1  260 262 + 
a1  260 262 + 
a1  260 262 + 
a1  260 262 + 
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -)

output

c   s   e   st  1   2   3   4   5   6   7   8   9   10
a1  100 200 +   1   2   0   0   0   0   0   0   0   0
a1  250 270 +   0   0   0   9   5   0   0   0   0   0
a1  330 340 -   0   0   0   0   0   0   0   6   7   0
a2  450 460 +   0   0   0   0   0   0   0   0   0   0

I want to count density of points (input2) using input1 values. Means that a1-100-200 has how many points in this 100 to 200 range?. i.e. 3. A开发者_C百科nd I want to do the same for all the input values. And I want to compare each other. But the problem is that the length of values (200-100=100 or 270-250=20) are different. In order to compare them against each other I need to scale them in a way that I can compare. So I came up with 10 bins window (output). I count the input2 points using input1 bins. Finally I need to plot bins on x-axis and values on y axis xyplot(x(bins),y1(a1:100:200:+)+y2(a1:250:270:+y3...+y4)

"+" means we need to take 100 as start point and 200 as end point when we calculate bins (100-110 will be 1st bin .....) - means exactly opposite (190-200 will be the first bin )

1-10 means 1 to 10 bins

you need to use column 1 and 2 based on column1 key for bins. We remove th values the are not in range

c = character, s =start, e=end, s=strand, 1-10 are bins of input1. yes you are right abt binning. For example 250-270 should have 2 numbers difference because (270-250=20, therefore for for 10 bins it would be 20/10=2)


The question is still not very well formed so I'm not sure I've quite understood what you want, but you probably want to use a combination of table and cut.

Your sample data

input1 <- data.frame(
  type  = paste("a", rep(1:2, times = c(3, 1)), sep = ""),
  lower = c(100, 250, 333, 450),
  upper = c(200, 270, 340, 460)
)

input2 <- data.frame(
  type = rep.int("a1", 28),
  lower = rep(c(101, 112, 258, 260, 332, 331), times = c(1, 1, 9, 5, 7, 5)),
  upper = rep(c(106, 117, 259, 262, 333), times = c(1, 1, 9, 5, 12))
)

First you define bins based upon the values in input1.

cut_points <- with(input1, sort(c(start, end)))

Then split input2$start by type, cut it up by bins and find the count in each.

with(input2, tapply(start, type, function(x) table(cut(x, cut_points))))

Possibly repeat with the end column.

with(input2, tapply(end, type, function(x) table(cut(x, cut_points))))
0

精彩评论

暂无评论...
验证码 换一张
取 消