开发者

Using ifelse or case_when on a data frame in R

开发者 https://www.devze.com 2022-12-07 18:54 出处:网络
I am sure the solution to my problem is simple but I am new to coding and cannot seem to find the answer online. I am working on a dataset that is made up of qualitative data that was collected and co

I am sure the solution to my problem is simple but I am new to coding and cannot seem to find the answer online. I am working on a dataset that is made up of qualitative data that was collected and coded. The dataset includes variables named code 1, code 2, code 3, code 4 and each respondent can have multiple codes and they all have at least one code. I am trying to add a variable that will reflect the number of codes given to a participant. So, participants data looks something like this with the numerical values being codes that we assign given their response:

ID Code1 Code2 Code3 Code4
1.  5      NA    NA    NA 
2.  7       6    4     NA
3.  5      12    NA    NA

The variable I want to include would be the one named count and would look like this:

ID Code1 Code2 Code3 Code4 Count
1.  5      NA    NA    NA   1
2.  7       6    4     NA   3
3.  5      12    NA    NA   2

The first participant would have the number 1 under Count because they only received one开发者_高级运维 code, participant 2 would have a number three under count because they have three codes, and participant 3 would have 2 codes under count because they were only assigned two codes.

Anyway, I have tried using the ifelse function using NA since that signals that fewer codes were assigned but when I try to use it I cannot assign more than 2 outcomes, that is my count variable cannot be more than two different numbers and these can go up to 4. I have also tried using case_when but get an error message saying Error: Case 7 (!is.na(Code1) ~ 1) must be a two-sided formula, not a logical vector.

Here is an example of what I have tried:

df$count = ifelse(is.na(df$Code2),1,2)

df$count = ifelse(is.na(Klara$Code3),2,3)

df$count = ifelse(is.na(Klara$Code4),3,4)

I have also tried:

df <- df %>%
  mutate(count = case_when(!is.na(Code1) ~ 1, 
                                 !is.na(Code2) ~ 2, 
                                 !is.na(Code3) ~ 3,
                                 !is.na(Code4) ~ 4,
                                xor(Code1,Code2)))

So, I cannot figure out what I am doing wrong and how I can get the count variable I need to work. Any suggestions?

Many thanks in advance!!


A dplyr approach using rowSums and across:

library(dplyr, warn = FALSE)

dat <- dat |>
  mutate(count = rowSums(
    across(starts_with("Code"), ~ !is.na(.x))
  ))
dat
#>   ID Code1 Code2 Code3 Code4 count
#> 1  1     5    NA    NA    NA     1
#> 2  2     7     6     4    NA     3
#> 3  3     5    12    NA    NA     2

Or using base R:

dat$count <- rowSums(
  !is.na(dat[grep("^Code", names(dat), value = TRUE)])
)
dat
#>   ID Code1 Code2 Code3 Code4 count
#> 1  1     5    NA    NA    NA     1
#> 2  2     7     6     4    NA     3
#> 3  3     5    12    NA    NA     2

DATA

dat <- structure(list(ID = c(1, 2, 3), Code1 = c(5L, 7L, 5L), Code2 = c(
  NA,
  6L, 12L
), Code3 = c(NA, 4L, NA), Code4 = c(NA, NA, NA)), class = "data.frame", row.names = c(
  NA,
  -3L
))


I think you are looking for something like this:

Recreating data (using tidyverse) - you can ignore this

a = c(1, 5, NA, NA, NA)
b = c(2, 7, 6,  4,  NA)
c = c(3, 5, 12, NA, NA)

df <- cbind(a,b,c) %>%
  t() %>% 
  data.frame() %>% 
  setNames(c('id', 'code1', 'code2', 'code3', 'code4')) 

Solutions:

#a
df$count <- rowSums(!is.na(df) & !colnames(df)=='id')

#b
df$count <- apply(df, 1, \(x) sum(!is.na(x) & !colnames(df)=='id'))

  id code1 code2 code3 code4 count
a  1     5    NA    NA    NA     1
b  2     7     6     4    NA     3
c  3     5    12    NA    NA     2
0

精彩评论

暂无评论...
验证码 换一张
取 消