Where can I find information on the differences between calling on a column within a data.frame via:
df <- data.frame(x=1:20,y=lett开发者_StackOverflowers[1:20],z=20:1)
df$x
df["x"]
They both return the "same" results, but not necessarily in the same format. Another thing that I've noticed is that df$x returns a list. Whereas df["x"] returns a data.frame.
EDIT: However, knowing which one to use in which situation has become a challenge. Is there a best practice here or does it really come down to knowing what the command or function requires? So far I've just been cycling through them if my function doesn't work at first (trial and error).
Another difference is that df$w
returns NULL
and df['w']
or df[['w']]
gives an error with your example dataframe.
If I'm not mistaken, df$x
is the same as df[['x']]
. [[
is used to select any single element, whereas [
returns a list of the selected elements. See also the language reference. I usually see that [[ is used for lists, [ for arrays and $ for getting a single column or element. If you need an expression (for example df[[name]] or df[,name]), then use the [ or [[ notation also. The [ notation is also used if multiple columns are selected. For example df[,c('name1', 'name2')]. I don't think there is a best-practices for this.
In addition to the indexing page in the manual, you can find this succinct description on the help page ?"$":
Indexing by ‘[’ is similar to atomic vectors and selects a list of the specified element(s).
Both ‘[[’ and ‘$’ select a single element of the list. The main difference is that ‘$’ does not allow computed indices, whereas ‘[[’ does. ‘x$name’ is equivalent to ‘x[["name", exact = FALSE]]’. Also, the partial matching behavior of ‘[[’ can be controlled using the ‘exact’ argument.
The function calls are, of course, different. See get("[.data.frame")
versus get("[[.data.frame")
versus get("$")
In this instance, for most uses, I'd avoid sub-setting altogether and trying to remember what $
, [
and [[
do with a data frame. I would just use with()
:
> df <- data.frame(x = 1:20, y = letters[1:20], z = 20:1)
> with(df, y)
[1] a b c d e f g h i j k l m n o p q r s t
Levels: a b c d e f g h i j k l m n o p q r s t
That is a lot clearer than any of the sub-setting methods in most cases (IMHO).
One thing I haven't seen explained explicitly is that [
and [[
can be used to select based on the value of a variable or expression while $
cannot. I.E you can do:
> example_frame <- data.frame(Var1 = c(1,2), Var2 = c('a', 'b'))
> x <- 'Var1'
> example_frame$x
NULL # Not what you wanted
> example_frame[x]
Var1
1 1
2 2
> example_frame[[x]]
[1] 1 2
> example_frame[[ paste(c("V","a","r",2), collapse='') ]]
[1] a b
Levels: a b
The differences between [
and [[
have been well covered by other posts and other questions.
If you use df[,"x"] instead of df["x"] you will get the same result as df$x. The comma indicates that you're selecting a column by name.
df$x
and df[[x]]
do the same thing.
Let's assume that you have a data set named one
. One of these variables is a factor variable, Region
. Using one$Region
will allow you to select a specific variable. Consider the following:
one <- read.csv("IED.csv")
one$Region
Running the following code also allows you to isolate that variable/level.
one[["Region"]]
Each code produces the following output:
> one$Region
[1] RC SOUTH RC SOUTH RC SOUTH RC EAST RC EAST
[6] RC EAST RC EAST RC EAST RC EAST RC EAST
[11] RC SOUTH RC SOUTH RC EAST RC EAST RC EAST
[16] RC EAST RC EAST RC SOUTH RC SOUTH RC EAST
[21] RC SOUTH RC EAST RC CAPITAL RC EAST RC EAST
> one[["Region"]]
[1] RC SOUTH RC SOUTH RC SOUTH RC EAST RC EAST
[6] RC EAST RC EAST RC EAST RC EAST RC EAST
[11] RC SOUTH RC SOUTH RC EAST RC EAST RC EAST
[16] RC EAST RC EAST RC SOUTH RC SOUTH RC EAST
[21] RC SOUTH RC EAST RC CAPITAL RC EAST RC EAST
"They both return the "same" results, but not necessarily in the same format." - I didn't notice any differences. Each command produced the same outputs in the same format. Perhaps its your data.
Hope that helps.
EDIT:
Misread the original question. df["x"]
produces the following:
> one["Region"]
Region
1 RC SOUTH
2 RC SOUTH
3 RC SOUTH
4 RC EAST
5 RC EAST
6 RC EAST
7 RC EAST
8 RC EAST
9 RC EAST
10 RC EAST
Not sure why the difference occurs.
精彩评论