开发者

Getting a row from a data frame as a vector in R

开发者 https://www.devze.com 2023-04-10 00:19 出处:网络
I know that to get a row from a data frame in R开发者_如何学运维, we can do this: data[row,] where row is an integer. But that spits out an ugly looking data structure where every column is labele

I know that to get a row from a data frame in R开发者_如何学运维, we can do this:

data[row,] 

where row is an integer. But that spits out an ugly looking data structure where every column is labeled with the names of the column names. How can I just get it a row as a list of value?


Data.frames created by importing data from a external source will have their data transformed to factors by default. If you do not want this set stringsAsFactors=FALSE

In this case to extract a row or a column as a vector you need to do something like this:

as.numeric(as.vector(DF[1,]))

or like this

as.character(as.vector(DF[1,]))


You can't necessarily get it as a vector because each column might have a different mode. You might have numerics in one column and characters in the next.

If you know the mode of the whole row, or can convert to the same type, you can use the mode's conversion function (for example, as.numeric()) to convert to a vector. For example:

> state.x77[1,]
Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost 
   3615.00    3624.00       2.10      69.05      15.10      41.30      20.00 
      Area 
  50708.00 
> as.numeric(state.x77[1,])
[1]  3615.00  3624.00     2.10    69.05    15.10    41.30    20.00 50708.00

This would work even if some of the columns were integers, although they would be converted to numeric floating-point numbers.


There is a problem with what you propose; namely that the components of data frames (what you call columns) can be of different data types. If you want a single row as a vector, that must contain only a single data type - they are atomic vectors!

Here is an example:

> set.seed(2)
> dat <- data.frame(A = 1:10, B = sample(LETTERS[1:4], 10, replace = TRUE))
> dat
    A B
1   1 A
2   2 C
3   3 C
4   4 A
5   5 D
6   6 D
7   7 A
8   8 D
9   9 B
10 10 C
> dat[1, ]
  A B
1 1 A

If we force it to drop the empty (column), the only recourse for R is to convert the row to a list to maintain the disparate data types.

> dat[1, , drop = TRUE]
$A
[1] 1

$B
[1] A
Levels: A B C D

The only logical solution to this it to get the data frame into a common type by coercing it to a matrix. This is done via data.matrix() for example:

> mat <- data.matrix(dat)
> mat[1,]
A B 
1 1

data.matrix() converts factors to their internal numeric codes. The above allows the first row to be extracted as a vector.

However, if you have character data in the data frame, the only recourse will be to create a character matrix, which may or may not be useful, and data.matrix() now can't be used, we need as.matrix() instead:

> dat$String <- LETTERS[1:10]
> str(dat)
'data.frame':   10 obs. of  3 variables:
 $ A     : int  1 2 3 4 5 6 7 8 9 10
 $ B     : Factor w/ 4 levels "A","B","C","D": 1 3 3 1 4 4 1 4 2 3
 $ String: chr  "A" "B" "C" "D" ...
> mat <- data.matrix(dat)
Warning message:
NAs introduced by coercion 
> mat
       A B String
 [1,]  1 1     NA
 [2,]  2 3     NA
 [3,]  3 3     NA
 [4,]  4 1     NA
 [5,]  5 4     NA
 [6,]  6 4     NA
 [7,]  7 1     NA
 [8,]  8 4     NA
 [9,]  9 2     NA
[10,] 10 3     NA
> mat <- as.matrix(dat)
> mat
      A    B   String
 [1,] " 1" "A" "A"   
 [2,] " 2" "C" "B"   
 [3,] " 3" "C" "C"   
 [4,] " 4" "A" "D"   
 [5,] " 5" "D" "E"   
 [6,] " 6" "D" "F"   
 [7,] " 7" "A" "G"   
 [8,] " 8" "D" "H"   
 [9,] " 9" "B" "I"   
[10,] "10" "C" "J"
> mat[1, ]
     A      B String 
  " 1"    "A"    "A" 
> class(mat[1, ])
[1] "character"


How about this?

library(tidyverse)
dat <- as_tibble(iris)
pulled_row <- dat %>% slice(3) %>% flatten_chr()

If you know all the values are same type, then use flatten_xxx.

Otherwise, I think flatten_chr() is safer.


As user "Reinstate Monica" notes, this problem has two parts:

  1. A data frame will often have different data types in each column that need to be coerced to character strings.
  2. Even after coercing the columns to character format, the data.frame "shell" needs to stripped-off to create a vector via a command like unlist.

With a combination of dplyr and base R this can be done in two lines. First, mutate_all converts all columns to character format. Second, the unlist commands extracts the vector out of the data.frame structure.

My particular issue was that the second line of a csv included the actual column names. So, I wanted to extract the second row to a vector and use that to assign column names. The following worked to extract the row as a character vector:

library(dplyr)
data_col_names <- data[2, ] %>% 
  mutate_all(as.character) %>% 
  unlist(., use.names=FALSE)

# example of using extracted row to rename cols
names(data) <- data_col_names

# only for this example, you'd want to remove row 2
# data <- data[-2, ]

(Note: Using as.character() in place of unlist will work too but it's less intuitive to apply as.character twice.)


I see that the most short variant is

c(t(data[row,]))

However if at least one column in data is a column of strings, so it will return string vector.

0

精彩评论

暂无评论...
验证码 换一张
取 消