开发者

R: Calculate the relative distance

开发者 https://www.devze.com 2023-03-07 00:55 出处:网络
I have a dataframe like variable x. x<-\"start.xstop.x strand.xstart.ystop.y strand.y 11695418916963562-1695418916963562-

I have a dataframe like variable x.

x<-"start.x    stop.x strand.x   start.y    stop.y strand.y
1  16954189  16963562        -  16954189  16963562        -
2  16954189  16963562        - 150045170 150065177        -
3 150045170 150065177        -  16954189  16963562        -
4 150045170 150065177        - 150045170 150065177        -
5  97061519  97190927        -  97061519  97190927        -
6  97061519  97190927        - 135190856 135202610        +
7 135190856 135202610        +  97061519  97190927        -
8 135190856 135202610        + 135190856 135202610        +"

dat <- read.table(textConnection(x), header=TRUE)

Normally I calculate for each row the relative distance between start.x and start.y with the following code:

zz <- transform(x, 
  distance_startsite = abs(as.numeric(start.x) - as.numeric(start.y)))

But before calculating this time, we first need to look to the strand.x and strand.y.

  • If the strand.x is "-" the official start site is stop.x
  • If the strand.x is "+" the official start site is start.x
  • If the strand.y is "-" the official start site is stop.y
  • If the stran开发者_运维知识库d.y is "+" the official start site is start.y

Row 1 in table dat must calucate this: abs(as.numeric(stop.x) - as.numeric(stop.y) instead of abs(as.numeric(start.x) - as.numeric(start.y).

My question is, is there a way to calculate this for each row like zz?

Thanks

EDIT: my first thought was something like this:

for (i in 1:nrow(dd)){
if (dat$strand.x[i,] == "-" & dat$stand.y[i,] == "-") {
  result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(stop.x[i,]) - as.numeric(stop.y[i,]))} else
if (dat$strand.x[i,] == "+" & dat$stand.y[i,] == "-") {
  result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(start.x[i,]) - as.numeric(stop.y[i,]))} else
if (dat$strand.x[i,] == "-" & dat$stand.y[i,] == "+") {
  result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(stop.x[i,]) - as.numeric(start.y[i,]))} else
if (dat$strand.x[i,] == "+" & dat$stand.y[i,] == "+") {
  result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(start.x[i,]) - as.numeric(start.y[i,]))} 
 }

But that doesn't work yet.


If you do this step by step and use some interim variables, you will save yourself a lot of trouble and your code will become much clearer.

Here is what I suggest:

  1. Add a column with the start and stop values (using your conditions)
  2. Calculate the absolute difference

Two further observations:

  • Your start and stop values are integer values, so you don't need to use as.numeric all the time
  • In your original question you have conflicting conditions for the start site, but no conditions for the stop site, so I took a guess to what you really meant.

The code:

dat$start <- with(dat, ifelse(strand.x=="+", start.x, stop.x))
dat$stop  <- with(dat, ifelse(strand.y=="+", start.y, stop.y))
dat$dist  <- with(dat, abs(stop-start))

The results:

dat

    start.x    stop.x strand.x   start.y    stop.y strand.y      dist
1  16954189  16963562        -  16954189  16963562        -         0
2  16954189  16963562        - 150045170 150065177        - 133101615
3 150045170 150065177        -  16954189  16963562        - 133101615
4 150045170 150065177        - 150045170 150065177        -         0
5  97061519  97190927        -  97061519  97190927        -         0
6  97061519  97190927        - 135190856 135202610        +  37999929
7 135190856 135202610        +  97061519  97190927        -  37999929
8 135190856 135202610        + 135190856 135202610        +         0


I tend to agree with@ Andrie, but if you really really want a 'single line solution' (well kind of):

zz <- transform(dat, distance_startsite = abs(ifelse(strand.x=="+", start.x, stop.x)-ifelse(strand.y=="+", start.y, stop.y)))
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号