At some point in my script I like to see the number of missing values
in my data.frame
and display them.
In my case I have:
out <- read.csv(file="...../OUT.csv", na.strings="NULL")
sum(is.na(out$codeHelper))
out[is.na(out$codeHelper),c(1,length(colnames(out)))]
It works perfectly fine.
However, the last command obviously gives me the whole data.frame
where the NA
is TRUE
, eg:
5561 Yemen (PDR) <NA>
5562 Yemen (PDR) <NA>
5563 Yemen (PDR) <NA>
5564 Yemen (PDR) <NA>
5565 Yemen (PDR) <NA>
5566 Yemen (PDR) <NA>
5567 Yemen (PDR) <NA>
5568 Yemen (PDR) <NA>
5601 Zaire (Democ Republic Congo) <NA>
5602 Zaire (Democ Republic Congo) <NA>
5603 Zaire (Democ Republic Congo) <NA>
5604 Zaire (Democ Republic Congo) <NA>
5605 Zaire (Democ Republic Congo) <NA>
With a big frame and a lot of NAs that looks pretty messy. Important to me is only where the NA occurs i.e which country (in the second column) has a missing value in the third column.
So how can i only display a single row for each country?
It shoul开发者_如何转开发d look something like this:
1 Yemen (PDR) <NA>
2 Zaire (Democ Republic Congo) <NA>
3 USA <NA>
4 W. Samoa <NA>
unique(c(1,2,3,4,4))
will give you
1 2 3 4
so
unique(out[is.na(out$codeHelper),c(1,length(colnames(out)))])
should be what you're looking for?
Try something like this:
subset(dataframe.name, !duplicated(country.colname),
select=c(col1.name, col2.name, ...))
see also this related question: how to remove partial duplicates from a data frame?
精彩评论