开发者

Replace NA values in a column with values from another df column if conditions are met

开发者 https://www.devze.com 2022-12-07 20:38 出处:网络
I need to copy values from a column in one dataframe to a column in another if conditions are met. Using the example below, if ID and BirthDate are the same in df1 and df2, AND Week2 is NA then copy t

I need to copy values from a column in one dataframe to a column in another if conditions are met.

Using the example below, if ID and BirthDate are the same in df1 and df2, AND Week2 is NA then copy the values from Week2a in df2.

ID <- c(1,2,3,4,5)
BirthDate <- c("2022-01-01", "2022-01-02", "2022-03-04", "2022-04-05", "2022-06-03")
week2 <- c("Y","Y","NA","NA","Y")

df1 <- data.frame(ID, BirthDate, week2)
df1
  ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04    NA
4  4 2022-04-05    NA
5  5 2022-06-03     Y

ID <- c(1,2,3,4,5)
BirthDate <- c("2022-01-01", "2022-01-02", "2022-03-04", "2022-04-05", "2022-06-03")
week2a <- c("NA","NA","P","P","NA")
df2 <- data.frame(ID, BirthDate, week2a)
df2
  ID  BirthDate week2a
1  1 2022-01-01     NA
2  2 2022-01-02     NA
3  3 2022-03-04      P
4  4 2022-04-05      P
5  5 2022-06-03     NA

The desired output I would get is

开发者_Go百科
  ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04     P
4  4 2022-04-05     P
5  5 2022-06-03     Y

Any help is greatly appreciated!!


Using dplyr::rows_patch():

library(dplyr)

df1 %>%
  rows_patch(
    rename(df2, week2 = week2a),
    by = c("ID", "BirthDate")
  )
  ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04     P
4  4 2022-04-05     P
5  5 2022-06-03     Y

NB, I assumed your dataset has real NAs rather than character "NA"s like your example; if not you’d first have to convert them (e.g., using tidyr::na_if(week2, "NA")).


With your data you can do something like this:

df2[df2$week2a=="NA",]$week2a<-df1$week2[df1$week2!="NA"]

# > df2
#
#   ID  BirthDate week2a
#   1  1 2022-01-01      Y
#   2  2 2022-01-02      Y
#   3  3 2022-03-04      P
#   4  4 2022-04-05      P
#   5  5 2022-06-03      Y

If your data points are NA values then you can use is.na instead of the logical expressions


df2[is.na(df2$week2a),]$week2a<-df1$week2[!is.na(df1$week2)]


First join by ID, then replace character 'NA' with real NA (you could omit this line, if NA is not just character NA). Then use coalesce:

library(dplyr)

left_join(df1, df2) %>% 
  mutate(across(starts_with("week"), ~na_if(., "NA"))) %>% 
  mutate(week2 = coalesce(week2, week2a), .keep="unused")
    ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04     P
4  4 2022-04-05     P
5  5 2022-06-03     Y
0

精彩评论

暂无评论...
验证码 换一张
取 消