I have a set of motorsport laptime data (mld) of the form:
car lap laptime
1 1 1 138.523
2 1 2 122.373
3 1 3 121.395
4 1 4 137.871
and I want to produce something of the form:
lap car.1 car.1.delta
1 1 138 NA
2 2 122 -16
3 3 121 -1
4 4 开发者_如何学Python127 6
I can use the R command diff(mld$laptime, lag=1) to produce the difference column, but how do I elegantly create the padded difference column in R?
Here are a couple of approaches:
1) zoo
If we represented this as a time series using zoo then the calculation would be particularly simple:
# test data with two cars
Lines <- "car lap laptime
1 1 138.523
1 2 122.373
1 3 121.395
1 4 137.871
2 1 138.523
2 2 122.373
2 3 121.395
2 4 137.871"
cat(Lines, "\n", file = "data.txt")
# read it into a zoo series, splitting it
# on car to give wide form (rather than long form)
library(zoo)
z <- read.zoo("data.txt", header = TRUE, split = 1, index = 2, FUN = as.numeric)
# now that its in the right form its simple
zz <- cbind(z, diff(z))
The last statement gives:
> zz
1.z 2.z 1.diff(z) 2.diff(z)
1 138.523 138.523 NA NA
2 122.373 122.373 -16.150 -16.150
3 121.395 121.395 -0.978 -0.978
4 137.871 137.871 16.476 16.476
To plot zz
, one column per panel, try this:
plot(zz, type = "o")
To only plot the differences we do not really need zz
in the first place as this will do:
plot(diff(z), type = "o")
(Add the screen=1
argument to the plot
command to plot everything on the same panel.)
2) ave. Here is a second solution that uses just plain R (except for the plotting) and keeps the output in long form; however, it is a bit more complex:
# assume same input as above
DF <- read.table("data.txt", header = TRUE)
DF$diff <- ave(DF$laptime, DF$car, FUN = function(x) c(NA, diff(x)))
The result is:
> DF
car lap laptime diff
1 1 1 138.523 NA
2 1 2 122.373 -16.150
3 1 3 121.395 -0.978
4 1 4 137.871 16.476
5 2 1 138.523 NA
6 2 2 122.373 -16.150
7 2 3 121.395 -0.978
8 2 4 137.871 16.476
To plot just the differences, one per panel, try this:
library(lattice)
xyplot(diff ~ lap | car, DF, type = "o")
Update
Added info above on plotting since the title of the question mentions this.
I think this is enough:
mld$car.1.delta = c(NA, diff(mld$laptime, lag = 1))
In your example you have truncated laptimes but rounded car.1.delta
, so if you really depends on how you want that to work, but code below gives what you posted.
Wrap everything in with
to simplify, and create a new data.frame based on modifications of the existing columns. Prepend an NA
to the diff
to pad it out.
with(mld,
data.frame(
lap = lap,
car.1 = trunc(laptime),
car.1.delta = c(NA, round(diff(laptime)))
)
)
lap car.1 car.1.delta
1 1 138 NA
2 2 122 -16
3 3 121 -1
4 4 137 16
I wonder if you want to do this by
car, and if so it will need a bit more handling but since you've literally asked for column car.1
I think this works so far as that goes.
精彩评论