How to represent a list of points in R_问答_开发者

开发者 https://www.devze.com 2022-12-28 19:44 出处：网络

I am working with a large list of points (each point has three dimensions x,y,z). I am pretty new with R, so I would like to know what is the best way to represent that kind of information.As far as

I am working with a large list of points (each point has three dimensions x,y,z).

I am pretty new with R, so I would like to know what is the best way to represent that kind of information. As far as I know, an array allows me to represent any multidimensional data, so currently I am using:

> points<-array( c(1,2,0,1,3,0,2,4,0,2,5,0,2,7,0,3,8,0), dim=c(3,6) )
> points
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    2    3  -- x dim
[2,]    2    3    4    5    7    8  -- y dim
[3,]    0    0    0    0    0    0  -- z dim

The aim is to perform some computations to calculate the euclidean distance between two sets of points such as:

points1<-array( c(1,2,0,1,3,0,2,4,0,2,5,0,2,7,0,3,8,0), dim=c(3,6) )
points2<-array( c(2,2,0,1,4,0,2,3,0,开发者_如何转开发2,4,0,2,6,0,2,8,0), dim=c(3,6) )

(any hint in this sense would also be highly appreciated)

Calculating the Euclidean distance between two sets of points stored like this is easy:

sqrt(colSums((points1 - points2)^2))

Although I'd second the recommendation to store dimensions in the columns. In that case the code becomes:

sqrt(rowSums((points1 - points2)^2))

You can get the distance matrix using the function dist. This function computes the distances between the rows of a data matrix, so I transposed your points array

dist(t(points),method = "euclidean")

Another similar function to compute the distance matrix is Dist from package amap, which provides even more distance measures : ("euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman", "kendall")

You probably want to see what the CRAN Task View for Statial Data Analysis has to offer -- there are a number of suitable packages.

I'd suggest working with your matrix transposed, or you'll probably end up calling the function t() more than you otherwise would.

Aside from that, this is probably the data structure you want. You could do it with a data frame of course, but I think you're better off not doing so in this situation.