I would like to use the subset
function in R to extract smaller groups of panel study time series data.
My data consists of a dataframe made up of six columns: district(8 districts), gender, age interval(4 groups), year, month and a count column.
Example:
District Gender Year Month AgeGroupNew TotalDeaths
1 Eastern Female 2003 1 0 4
2 Eastern Female 2003 1 01-4 1
3 Eastern Female 2003 1 05-14 1
4 Eastern Female 2003 1 15+ 91
5 Eastern Female 2003 2 0 4
6 Eastern Female 2003 2 01-4 1
I would like to extract smaller subset for each district, Gender and age interval to get something like this:
District Gender Year Month AgeGroupNew TotalDeaths
Northern Male 2003 1 01-4 0
Northern Male 2003 2 01-4 1
Northern Male 2003 3 01-4 0
Northern Male 2003 4 01-4 3
Northern Male 2003 5 01-4 4
Northern Male 2003 6 01-4 6
Northern Male 2003 7 01-4 5
Northern Male 2003 8 01-4 0
Northern Male 2003 9 01-4 1
Northern Male 2003 10 01-4 2
Northern Male 2003 11 01-4 0
Northern Male 2003 12 01-4 1
Northern Male 2004 1 01-4 1
Northern Male 2004 2 01-4 0
Going to
Northern Male 2006 11 01-4 0
Northern Male 2006 12 01-4 0
So far I have been trying to use this, thanks to DWin pointing it out in a previous question.
subset(datNew, subset=(District=="Eastern" & Gender=="Female" & AgeGroupNew=="01-4"))
[1] District Gender Year Month AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)
But R keeps on giving me the output as above - which it shouldn't.
I have tried other combinations with success, but it seems using 'District' in the subset
causes this <0 rows> (or 0-length row.names)
.
This works:
> head(subset(datNew, Year=="2004" & Month=="8" & AgeGroupNew =="0"))
District Gender Year Month AgeGroupNew TotalDeaths
77 Eastern Female 2004 8 0 10
269 Eastern Male 2004 8 0 6
461 Khayelitsha Female 2004 8 0 13
653 Khayelitsha Male 2004 8 0 15
845 Klipfontein Female 2004 8 0 7
1037 Klipfontein Male 2004 8 0 6
but not
> head(subset(datNew, District=="Eastern" & Gender=="Female" & AgeGroupNew =="0"))
[1] District Gender Year Month AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)
Any reason why District is causing this? It's absolutely wrong that there are 0 rows with that combination of the subset - there's enough data to my knowledge.
I've tried experimenting - and from other posts, this is a baby step closer to what I want to achieve, but still not working:
> head(subset(datNew,datNew[[1]] %in% District[1] & Gender=="Female" & AgeGroupNew=="0"))
District Gender Year Month AgeGroupNew TotalDeaths
1 Eastern Female 2003 1 0 4
5 Eastern Female 2003 2 0 4
9 Eastern Female 2003 3 0 5
13 Eastern Female 2003 4 0 12
17 Eastern Female 2003 5 0 7
21 Eastern Female 2003 6 0 13
With this I am unable to choose from the other Districts, such as "Southern", "Khayelitsha", etc. No matter what I change datNew[[1 or 2 or 3]]
and Di开发者_开发百科strict[[1 or 2 or 3]]
.
I don't really know what %in%
does above?
I am so stuck. Any help asseblief.
Prediction: Give us the results str(datNew$District[1]) and all will be revealed. I predict there is a non-printing character that will show up, perhaps a trailing space (or two).
So with the results of str(...) the correct code would be:
subset(datNew, District=="Eastern " & Gender=="Female" & AgeGroupNew =="0")
精彩评论