开发者

In R, What is the difference between df["x"] and df$x

开发者 https://www.devze.com 2023-01-10 00:50 出处:网络
Where can I find information on the differences between calling on a column within a data.frame via: df <- data.frame(x=1:20,y=lett开发者_StackOverflowers[1:20],z=20:1)

Where can I find information on the differences between calling on a column within a data.frame via:

df <- data.frame(x=1:20,y=lett开发者_StackOverflowers[1:20],z=20:1)

df$x
df["x"]

They both return the "same" results, but not necessarily in the same format. Another thing that I've noticed is that df$x returns a list. Whereas df["x"] returns a data.frame.

EDIT: However, knowing which one to use in which situation has become a challenge. Is there a best practice here or does it really come down to knowing what the command or function requires? So far I've just been cycling through them if my function doesn't work at first (trial and error).


Another difference is that df$w returns NULL and df['w'] or df[['w']] gives an error with your example dataframe.


If I'm not mistaken, df$x is the same as df[['x']]. [[ is used to select any single element, whereas [ returns a list of the selected elements. See also the language reference. I usually see that [[ is used for lists, [ for arrays and $ for getting a single column or element. If you need an expression (for example df[[name]] or df[,name]), then use the [ or [[ notation also. The [ notation is also used if multiple columns are selected. For example df[,c('name1', 'name2')]. I don't think there is a best-practices for this.


In addition to the indexing page in the manual, you can find this succinct description on the help page ?"$":

Indexing by ‘[’ is similar to atomic vectors and selects a list of the specified element(s).

Both ‘[[’ and ‘$’ select a single element of the list. The main difference is that ‘$’ does not allow computed indices, whereas ‘[[’ does. ‘x$name’ is equivalent to ‘x[["name", exact = FALSE]]’. Also, the partial matching behavior of ‘[[’ can be controlled using the ‘exact’ argument.

The function calls are, of course, different. See get("[.data.frame") versus get("[[.data.frame") versus get("$")


In this instance, for most uses, I'd avoid sub-setting altogether and trying to remember what $, [ and [[ do with a data frame. I would just use with():

> df <- data.frame(x = 1:20, y = letters[1:20], z = 20:1)
> with(df, y)
 [1] a b c d e f g h i j k l m n o p q r s t
Levels: a b c d e f g h i j k l m n o p q r s t

That is a lot clearer than any of the sub-setting methods in most cases (IMHO).


One thing I haven't seen explained explicitly is that [ and [[ can be used to select based on the value of a variable or expression while $ cannot. I.E you can do:

> example_frame <- data.frame(Var1 = c(1,2), Var2 = c('a', 'b'))
> x <- 'Var1'

> example_frame$x
NULL  # Not what you wanted

> example_frame[x]
  Var1
1    1
2    2

> example_frame[[x]]
[1] 1 2

> example_frame[[ paste(c("V","a","r",2), collapse='') ]]
[1] a b
Levels: a b

The differences between [ and [[ have been well covered by other posts and other questions.


If you use df[,"x"] instead of df["x"] you will get the same result as df$x. The comma indicates that you're selecting a column by name.


df$x and df[[x]] do the same thing.

Let's assume that you have a data set named one. One of these variables is a factor variable, Region. Using one$Region will allow you to select a specific variable. Consider the following:

one <- read.csv("IED.csv")
one$Region

Running the following code also allows you to isolate that variable/level.

one[["Region"]]

Each code produces the following output:

> one$Region
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 


> one[["Region"]]
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 

"They both return the "same" results, but not necessarily in the same format." - I didn't notice any differences. Each command produced the same outputs in the same format. Perhaps its your data.

Hope that helps.

EDIT:

Misread the original question. df["x"] produces the following:

> one["Region"]
             Region
1          RC SOUTH
2          RC SOUTH
3          RC SOUTH
4           RC EAST
5           RC EAST
6           RC EAST
7           RC EAST
8           RC EAST
9           RC EAST
10          RC EAST

Not sure why the difference occurs.

0

精彩评论

暂无评论...
验证码 换一张
取 消