In a data.frame
(or data.table
), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame
) 开发者_开发百科is the following:
> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
I would like a function fill.NAs()
that allows me to construct yy
such that:
> yy
[1] NA NA NA 2 2 2 2 3 3 3 4 4
I need to repeat this operation for many (total ~1 Tb) small sized data.frame
s (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?
The ugly solution I cooked up uses this function:
last <- function (x){
x[length(x)]
}
fill.NAs <- function(isNA){
if (isNA[1] == 1) {
isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs
# can't be forward filled
}
isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)
isNA.pos[isNA.diff < 0] <- 0
isNA.neg[isNA.diff > 0] <- 0
which.isNA.neg <- which(as.logical(isNA.neg))
if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works
which.isNA.pos <- which(as.logical(isNA.pos))
which.isNA <- which(as.logical(isNA))
if (length(which.isNA.neg)==length(which.isNA.pos)){
replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)],
which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] -
which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
} else {
replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])
replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
}
replacement
}
The function fill.NAs
is used as follows:
y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
isNA <- as.numeric(is.na(y))
replacement <- fill.NAs(isNA)
if (length(replacement)){
which.isNA <- which(as.logical(isNA))
to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]
y[to.replace] <- y[replacement]
}
Output
> y
[1] NA 2 2 2 2 3 3 3 4 4 4
... which seems to work. But, man, is it ugly! Any suggestions?
You probably want to use the na.locf()
function from the zoo package to carry the last observation forward to replace your NA values.
Here is the beginning of its usage example from the help page:
library(zoo)
az <- zoo(1:6)
bz <- zoo(c(2,NA,1,4,5,2))
na.locf(bz)
1 2 3 4 5 6
2 2 1 4 5 2
na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6
2 1 1 4 5 2
cz <- zoo(c(NA,9,3,2,3,2))
na.locf(cz)
2 3 4 5 6
9 3 2 3 2
Sorry for digging up an old question. I couldn't look up the function to do this job on the train, so I wrote one myself.
I was proud to find out that it's a tiny bit faster.
It's less flexible though.
But it plays nice with ave
, which is what I needed.
repeat.before = function(x) { # repeats the last non NA value. Keeps leading NA
ind = which(!is.na(x)) # get positions of nonmissing values
if(is.na(x[1])) # if it begins with a missing, add the
ind = c(1,ind) # first position to the indices
rep(x[ind], times = diff( # repeat the values at these indices
c(ind, length(x) + 1) )) # diffing the indices + length yields how often
} # they need to be repeated
x = c(NA,NA,'a',NA,NA,NA,NA,NA,NA,NA,NA,'b','c','d',NA,NA,NA,NA,NA,'e')
xx = rep(x, 1000000)
system.time({ yzoo = na.locf(xx,na.rm=F)})
## user system elapsed
## 2.754 0.667 3.406
system.time({ yrep = repeat.before(xx)})
## user system elapsed
## 0.597 0.199 0.793
Edit
As this became my most upvoted answer, I was reminded often that I don't use my own function, because I often need zoo's maxgap
argument. Because zoo has some weird problems in edge cases when I use dplyr + dates that I couldn't debug, I came back to this today to improve my old function.
I benchmarked my improved function and all the other entries here. For the basic set of features, tidyr::fill
is fastest while also not failing the edge cases. The Rcpp entry by @BrandonBertelsen is faster still, but it's inflexible regarding the input's type (he tested edge cases incorrectly due to a misunderstanding of all.equal
).
If you need maxgap
, my function below is faster than zoo (and doesn't have the weird problems with dates).
I put up the documentation of my tests.
new function
repeat_last = function(x, forward = TRUE, maxgap = Inf, na.rm = FALSE) {
if (!forward) x = rev(x) # reverse x twice if carrying backward
ind = which(!is.na(x)) # get positions of nonmissing values
if (is.na(x[1]) && !na.rm) # if it begins with NA
ind = c(1,ind) # add first pos
rep_times = diff( # diffing the indices + length yields how often
c(ind, length(x) + 1) ) # they need to be repeated
if (maxgap < Inf) {
exceed = rep_times - 1 > maxgap # exceeding maxgap
if (any(exceed)) { # any exceed?
ind = sort(c(ind[exceed] + 1, ind)) # add NA in gaps
rep_times = diff(c(ind, length(x) + 1) ) # diff again
}
}
x = rep(x[ind], times = rep_times) # repeat the values at these indices
if (!forward) x = rev(x) # second reversion
x
}
I've also put the function in my formr package (Github only).
a data.table
solution:
dt <- data.table(y = c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA))
dt[, y_forward_fill := y[1], .(cumsum(!is.na(y)))]
dt
y y_forward_fill
1: NA NA
2: 2 2
3: 2 2
4: NA 2
5: NA 2
6: 3 3
7: NA 3
8: 4 4
9: NA 4
10: NA 4
this approach could work with forward filling zeros as well:
dt <- data.table(y = c(0, 2, -2, 0, 0, 3, 0, -4, 0, 0))
dt[, y_forward_fill := y[1], .(cumsum(y != 0))]
dt
y y_forward_fill
1: 0 0
2: 2 2
3: -2 -2
4: 0 -2
5: 0 -2
6: 3 3
7: 0 3
8: -4 -4
9: 0 -4
10: 0 -4
this method becomes very useful on data at scale and where you would want to perform a forward fill by group(s), which is trivial with data.table
. just add the group(s) to the by
clause prior to the cumsum
logic.
dt <- data.table(group = sample(c('a', 'b'), 20, replace = TRUE), y = sample(c(1:4, rep(NA, 4)), 20 , replace = TRUE))
dt <- dt[order(group)]
dt[, y_forward_fill := y[1], .(group, cumsum(!is.na(y)))]
dt
group y y_forward_fill
1: a NA NA
2: a NA NA
3: a NA NA
4: a 2 2
5: a NA 2
6: a 1 1
7: a NA 1
8: a 3 3
9: a NA 3
10: a NA 3
11: a 4 4
12: a NA 4
13: a 1 1
14: a 4 4
15: a NA 4
16: a 3 3
17: b 4 4
18: b NA 4
19: b NA 4
20: b 2 2
You can use the data.table
function nafill
, available from data.table >= 1.12.3
.
library(data.table)
nafill(y, type = "locf")
# [1] NA 2 2 2 2 3 3 4 4 4
If your vector is a column in a data.table
, you can also update it by reference with setnafill
:
d <- data.table(x = 1:10, y)
setnafill(d, type = "locf", cols = "y")
d
# x y
# 1: 1 NA
# 2: 2 2
# 3: 3 2
# 4: 4 2
# 5: 5 2
# 6: 6 3
# 7: 7 3
# 8: 8 4
# 9: 9 4
# 10: 10 4
If you have NA
in several columns...
d <- data.table(x = c(1, NA, 2), y = c(2, 3, NA), z = c(4, NA, 5))
# x y z
# 1: 1 2 4
# 2: NA 3 NA
# 3: 2 NA 5
...you can fill them by reference in one go:
setnafill(d, type = "locf")
d
# x y z
# 1: 1 2 4
# 2: 1 3 4
# 3: 2 3 5
Note that:
Only double and integer data types are currently [
data.table 1.12.6
] supported.
The functionality will most likely soon be extended; see the open issue nafill, setnafill for character, factor and other types, where you also find a temporary workaround.
The tidyr
package (part of the tidyverse
suite of packages) has a simple way to do that:
y = c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
# first, transform it into a data.frame
df = as.data.frame(y)
y
1 NA
2 2
3 2
4 NA
5 NA
6 3
7 NA
8 4
9 NA
10 NA
library(tidyr)
fill(df, y, .direction = 'down')
y
1 NA
2 2
3 2
4 2
5 2
6 3
7 3
8 4
9 4
10 4
Throwing my hat in:
library(Rcpp)
cppFunction('IntegerVector na_locf(IntegerVector x) {
int n = x.size();
for(int i = 0; i<n; i++) {
if((i > 0) && (x[i] == NA_INTEGER) & (x[i-1] != NA_INTEGER)) {
x[i] = x[i-1];
}
}
return x;
}')
Setup a basic sample and a benchmark:
x <- sample(c(1,2,3,4,NA))
bench_em <- function(x,count = 10) {
x <- sample(x,count,replace = TRUE)
print(microbenchmark(
na_locf(x),
replace_na_with_last(x),
na.lomf(x),
na.locf(x),
repeat.before(x)
), order = "mean", digits = 1)
}
And run some benchmarks:
bench_em(x,1e6)
Unit: microseconds
expr min lq mean median uq max neval
na_locf(x) 697 798 821 814 821 1e+03 100
na.lomf(x) 3511 4137 5002 4214 4330 1e+04 100
replace_na_with_last(x) 4482 5224 6473 5342 5801 2e+04 100
repeat.before(x) 4793 5044 6622 5097 5520 1e+04 100
na.locf(x) 12017 12658 17076 13545 19193 2e+05 100
Just in case:
all.equal(
na_locf(x),
replace_na_with_last(x),
na.lomf(x),
na.locf(x),
repeat.before(x)
)
[1] TRUE
Update
For a numeric vector, the function is a bit different:
NumericVector na_locf_numeric(NumericVector x) {
int n = x.size();
LogicalVector ina = is_na(x);
for(int i = 1; i<n; i++) {
if((ina[i] == TRUE) & (ina[i-1] != TRUE)) {
x[i] = x[i-1];
}
}
return x;
}
Dealing with a big data volume, in order to be more efficient, we can use the data.table package.
require(data.table)
replaceNaWithLatest <- function(
dfIn,
nameColNa = names(dfIn)[1]
){
dtTest <- data.table(dfIn)
setnames(dtTest, nameColNa, "colNa")
dtTest[, segment := cumsum(!is.na(colNa))]
dtTest[, colNa := colNa[1], by = "segment"]
dtTest[, segment := NULL]
setnames(dtTest, "colNa", nameColNa)
return(dtTest)
}
This has worked for me:
replace_na_with_last<-function(x,a=!is.na(x)){
x[which(a)[c(1,1:sum(a))][cumsum(a)+1]]
}
> replace_na_with_last(c(1,NA,NA,NA,3,4,5,NA,5,5,5,NA,NA,NA))
[1] 1 1 1 1 3 4 5 5 5 5 5 5 5 5
> replace_na_with_last(c(NA,"aa",NA,"ccc",NA))
[1] "aa" "aa" "aa" "ccc" "ccc"
speed is reasonable too:
> system.time(replace_na_with_last(sample(c(1,2,3,NA),1e6,replace=TRUE)))
user system elapsed
0.072 0.000 0.071
Having a leading NA
is a bit of a wrinkle, but I find a very readable (and vectorized) way of doing LOCF when the leading term is not missing is:
na.omit(y)[cumsum(!is.na(y))]
A slightly less readable modification works in general:
c(NA, na.omit(y))[cumsum(!is.na(y))+1]
gives the desired output:
c(NA, 2, 2, 2, 2, 3, 3, 4, 4, 4)
Try this function. It does not require the ZOO package:
# last observation moved forward
# replaces all NA values with last non-NA values
na.lomf <- function(x) {
na.lomf.0 <- function(x) {
non.na.idx <- which(!is.na(x))
if (is.na(x[1L])) {
non.na.idx <- c(1L, non.na.idx)
}
rep.int(x[non.na.idx], diff(c(non.na.idx, length(x) + 1L)))
}
dim.len <- length(dim(x))
if (dim.len == 0L) {
na.lomf.0(x)
} else {
apply(x, dim.len, na.lomf.0)
}
}
Example:
> # vector
> na.lomf(c(1, NA,2, NA, NA))
[1] 1 1 2 2 2
>
> # matrix
> na.lomf(matrix(c(1, NA, NA, 2, NA, NA), ncol = 2))
[,1] [,2]
[1,] 1 2
[2,] 1 2
[3,] 1 2
There are a bunch of packages offering na.locf
(NA
Last Observation Carried Forward) functions:
xts
-xts::na.locf
zoo
-zoo::na.locf
imputeTS
-imputeTS::na.locf
spacetime
-spacetime::na.locf
And also other packages where this function is named differently.
Following up on Brandon Bertelsen's Rcpp contributions. For me, the NumericVector version didn't work: it only replaced the first NA. This is because the ina
vector is only evaluated once, at the beginning of the function.
Instead, one can take the exact same approach as for the IntegerVector function. The following worked for me:
library(Rcpp)
cppFunction('NumericVector na_locf_numeric(NumericVector x) {
R_xlen_t n = x.size();
for(R_xlen_t i = 0; i<n; i++) {
if(i > 0 && !R_finite(x[i]) && R_finite(x[i-1])) {
x[i] = x[i-1];
}
}
return x;
}')
In case you need a CharacterVector version, the same basic approach also works:
cppFunction('CharacterVector na_locf_character(CharacterVector x) {
R_xlen_t n = x.size();
for(R_xlen_t i = 0; i<n; i++) {
if(i > 0 && x[i] == NA_STRING && x[i-1] != NA_STRING) {
x[i] = x[i-1];
}
}
return x;
}')
Here is a modification of @AdamO's solution. This one runs faster, because it bypasses the na.omit
function. This will overwrite the NA
values in vector y
(except for leading NA
s).
z <- !is.na(y) # indicates the positions of y whose values we do not want to overwrite
z <- z | !cumsum(z) # for leading NA's in y, z will be TRUE, otherwise it will be FALSE where y has a NA and TRUE where y does not have a NA
y <- y[z][cumsum(z)]
I want to add a next solution which using the runner
r cran package.
library(runner)
y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
fill_run(y, FALSE)
[1] NA 2 2 2 2 3 3 4 4 4
The whole package is optimized and major of it was written in cpp. Thus offer a great efficiency.
fill.NAs <- function(x) {is_na<-is.na(x); x[Reduce(function(i,j) if (is_na[j]) i else j, seq_len(length(x)), accumulate=T)]}
fill.NAs(c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA))
[1] NA 2 2 2 2 3 3 4 4 4
Reduce is a nice functional programming concept that may be useful for similar tasks. Unfortunately in R it is ~70 times slower than repeat.before
in the above answer.
I personally use this function. I do not know how fast or slow it is. But it does its job without having to use libraries.
replace_na_with_previous<-function (vector) {
if (is.na(vector[1]))
vector[1] <- na.omit(vector)[1]
for (i in 1:length(vector)) {
if ((i - 1) > 0) {
if (is.na(vector[i]))
vector[i] <- vector[i - 1]
}
}
return(vector)
}
if you want to apply this function in a dataframe, if your dataframe is called df then simply
df[]<-lapply(df,replace_na_with_previous)
I'm posting this here as this might be helpful for others with problems similar to the asked question.
The most recent tidyverse
solution using the vctrs
package can be compined with mutate
to create a new column
library(dplyr)
library(magrittr)
library(vctrs)
as.data.frame(y) %>%
mutate(y_filled = vec_fill_missing(y, direction = c("down")) )
Returns
y y_filled
1 NA NA
2 2 2
3 2 2
4 NA 2
5 NA 2
6 3 3
7 NA 3
8 4 4
9 NA 4
10 NA 4
While changing the 'filling direction' to 'up'
results in:
y y_filled
1 NA 2
2 2 2
3 2 2
4 NA 3
5 NA 3
6 3 3
7 NA 4
8 4 4
9 NA NA
10 NA NA
Might wanna also try "downup"
or "updown"
Please note that this solution is still in experimental life cycle so the syntax might change.
I tried the below:
nullIdx <- as.array(which(is.na(masterData$RequiredColumn)))
masterData$RequiredColumn[nullIdx] = masterData$RequiredColumn[nullIdx-1]
nullIdx gets the idx number where ever masterData$RequiredColumn has a Null/ NA value. In the next line we replace it with the corresponding Idx-1 value, i.e. the last good value before each NULL/ NA
This worked for me, although I'm not sure whether it is more efficient than other suggestions.
rollForward <- function(x){
curr <- 0
for (i in 1:length(x)){
if (is.na(x[i])){
x[i] <- curr
}
else{
curr <- x[i]
}
}
return(x)
}
Too late to the party, but a very concise and expandable answer for use with library(data.table)
and therefore usable as dt[,SomeVariable:= FunctionBellow, by = list(group)]
.
library(imputeTS)
y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
y
[1] NA 2 2 NA NA 3 NA 4 NA NA
imputeTS::na_locf(imputeTS::na_locf(y,option = "nocb"),option="locf")
[1] 2 2 2 3 3 3 4 4 4 4
An option in base, derive from the answers of @Montgomery-Clift and @AdamO, replacing NA
's with latest non-NA
value could be:
y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
i <- c(TRUE, !is.na(y[-1]))
y[i][cumsum(i)]
# [1] NA 2 2 2 2 3 3 4 4 4
When only a few NA
exist they could be overwritten with the values of the latest non-NA value instead of creating a new vector.
fillNaR <- function(y) {
i <- which(is.na(y[-1]))
j <- which(diff(c(-1L,i)) > 1)
k <- diff(c(j, length(i) + 1))
i <- rep(i[j], k)
`[<-`(y, i + sequence(k), y[i])
}
fillNaR(y)
# [1] NA 2 2 2 2 3 3 4 4 4
When speed is important a loop propagating the last non-NA value in a loop could be written using RCPP. To be flexible on the input type this can be done using a template.
Rcpp::sourceCpp(code=r"(
#include <Rcpp.h>
using namespace Rcpp;
template <int RTYPE>
Vector<RTYPE> FNA(const Vector<RTYPE> y) {
auto x = clone(y); //or overwrite original
LogicalVector isNA = is_na(x);
size_t i = 0;
while(isNA[i] && i < x.size()) ++i;
for(++i; i < x.size(); ++i) if(isNA[i]) x[i] = x[i-1];
return x;
}
// [[Rcpp::export]]
RObject fillNaC(RObject x) {
RCPP_RETURN_VECTOR(FNA, x);
}
)")
fillNaC(y)
# [1] NA 2 2 2 2 3 3 4 4 4
Those functions can be used inside lapply
to apply them on all columns of a data.frame
.
DF[] <- lapply(DF, fillNaC)
Other answers using Rcpp, specialized on a data type, look like the following but are updating also the input vector.
y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
Rcpp::cppFunction("NumericVector fillNaCN(NumericVector x) {
for(auto i = x.begin()+1; i < x.end(); ++i) if(*i != *i) *i = *(i-1);
return x;
}")
fillNaCN(y)
# [1] NA 2 2 2 2 3 3 4 4 4
y
# [1] NA 2 2 2 2 3 3 4 4 4
Benchmark
fillNaR <- function(y) {
i <- which(is.na(y[-1]))
j <- which(diff(c(-1L,i)) > 1)
k <- diff(c(j, length(i) + 1))
i <- rep(i[j], k)
`[<-`(y, i + sequence(k), y[i])
}
Rcpp::sourceCpp(code=r"(
#include <Rcpp.h>
using namespace Rcpp;
template <int RTYPE>
Vector<RTYPE> FNA(const Vector<RTYPE> y) {
auto x = clone(y); //or overwrite original
LogicalVector isNA = is_na(x);
size_t i = 0;
while(isNA[i] && i < x.size()) ++i;
for(++i; i < x.size(); ++i) if(isNA[i]) x[i] = x[i-1];
return x;
}
// [[Rcpp::export]]
RObject fillNaC(RObject x) {
RCPP_RETURN_VECTOR(FNA, x);
}
)")
repeat.before <- function(x) { # @Ruben
ind = which(!is.na(x))
if(is.na(x[1])) ind = c(1,ind)
rep(x[ind], times = diff(c(ind, length(x) + 1) ))
}
RB2 <- function(x) {
ind = which(c(TRUE, !is.na(x[-1])))
rep(x[ind], diff(c(ind, length(x) + 1)))
}
MC <- function(y) { # @Montgomery Clift
z <- !is.na(y)
z <- z | !cumsum(z)
y[z][cumsum(z)]
}
MC2 <- function(y) {
z <- c(TRUE, !is.na(y[-1]))
y[z][cumsum(z)]
}
fill.NAs <- function(x) { # @Valentas
is_na <- is.na(x)
x[Reduce(function(i,j) if (is_na[j]) i else j, seq_len(length(x)), accumulate=T)]}
M <- alist(
fillNaR = fillNaR(y),
fillNaC = fillNaC(y),
repeat.before = repeat.before(y),
RB2 = RB2(y),
MC = MC(y),
MC2 = MC2(y),
fill.NAs = fill.NAs(y),
tidyr = tidyr::fill(data.frame(y), y)$y,
zoo = zoo::na.locf(y, na.rm=FALSE),
data.table = data.table::nafill(y, type = "locf"),
data.table2 = with(data.table::data.table(y)[, y := y[1], .(cumsum(!is.na(y)))], y),
imputeTS = imputeTS::na_locf(y, na_remaining = "keep"),
runner = runner::fill_run(y, FALSE),
vctrs = vctrs::vec_fill_missing(y, direction = "down"),
ave = ave(y, cumsum(!is.na(y)), FUN = \(x) x[1])
)
Result
n <- 1e5
set.seed(42); y <- rnorm(n); is.na(y) <- sample(seq_along(y), n/100)
bench::mark(exprs = M) #1% NA
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
# 1 fillNaR 399.82µs 1.02ms 459. 3.56MB 31.9 230 16
# 2 fillNaC 672.85µs 883.74µs 976. 1.15MB 22.0 488 11
# 3 repeat.before 1.28ms 2.8ms 290. 7.57MB 58.0 145 29
# 4 RB2 1.93ms 3.66ms 229. 9.86MB 57.7 115 29
# 5 MC 1.01ms 1.98ms 289. 5.33MB 37.9 145 19
# 6 MC2 884.6µs 1.96ms 393. 6.09MB 53.5 198 27
# 7 fill.NAs 89.37ms 93.1ms 10.1 4.58MB 13.5 6 8
# 8 tidyr 8.42ms 11.3ms 86.3 1.55MB 5.89 44 3
# 9 zoo 1.83ms 3.19ms 216. 7.96MB 31.9 108 16
#10 data.table 73.91µs 259.71µs 2420. 797.38KB 36.0 1210 18
#11 data.table2 54.54ms 58.71ms 16.9 3.47MB 3.75 9 2
#12 imputeTS 623.69µs 1.07ms 494. 2.69MB 30.0 247 15
#13 runner 1.36ms 1.58ms 586. 783.79KB 10.0 293 5
#14 vctrs 149.98µs 317.14µs 1725. 1.53MB 54.0 863 27
#15 ave 137.87ms 149.25ms 6.53 14.77MB 8.17 4 5
set.seed(42); y <- rnorm(n); is.na(y) <- sample(seq_along(y), n/2)
bench::mark(exprs = M) #50% NA
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
# 1 fillNaR 2.15ms 3.13ms 217. 7.92MB 59.7 109 30
# 2 fillNaC 949.22µs 1.09ms 728. 1.15MB 28.0 364 14
# 3 repeat.before 1.36ms 1.89ms 287. 4.77MB 49.6 185 32
# 4 RB2 1.64ms 2.44ms 347. 7.06MB 39.9 174 20
# 5 MC 1.48ms 1.92ms 443. 4.77MB 34.0 222 17
# 6 MC2 1.09ms 1.72ms 479. 5.53MB 45.9 240 23
# 7 fill.NAs 93.17ms 104.28ms 9.58 4.58MB 9.58 5 5
# 8 tidyr 7.09ms 10.07ms 96.7 1.55MB 3.95 49 2
# 9 zoo 1.62ms 2.28ms 344. 5.53MB 29.8 173 15
#10 data.table 389.69µs 484.81µs 1225. 797.38KB 14.0 613 7
#11 data.table2 27.46ms 29.32ms 33.4 3.1MB 3.93 17 2
#12 imputeTS 1.71ms 2.1ms 413. 3.44MB 25.9 207 13
#13 runner 1.62ms 1.75ms 535. 783.79KB 7.98 268 4
#14 vctrs 144.92µs 293.44µs 2045. 1.53MB 48.0 1023 24
#15 ave 66.38ms 71.61ms 14.0 10.78MB 10.5 8 6
Depending on how many NA's are filled up either data.table::nafill
or vctrs::vec_fill_missing
are the fastest.
精彩评论