I have a data set that defines a set of points on a 2-dimen开发者_如何学运维sional Cartesian plane. Theoretically, those points should form a line, but that line may be perfectly horizontal, perfectly vertical, and anything in between.
I would like to design an algorithm that rates the 'straightness' of that line.
For example, the following data sets would be perfectly straight:
Y = 2/3x + 4
X | Y
---------
-3 | 2
0 | 4
3 | 6
Y = 4
X | Y
---------
1 | 4
2 | 4
3 | 4
X = -1
X | Y
---------
-1 | 7
-1 | 8
-1 | 9
While this one would not:
X | Y
---------
-3 | 2
0 | 5
3 | 6
I think it would work to minimize the sum of the squares of the distances of each point from to a line (usually called a regression line), then determine the average distance of each point to the line. Thus, a perfectly straight line would have an average distance of 0.
Because the data can represent a line that is vertical, as I understand it, the usual least-squares regression line won't work for this data set. A perpendicular least-squares regression line might work, but I've had little luck finding an implementation of one.
I am working in Excel 2010 VBA, but I should be able to translate any reasonable algorithm.
Thanks, PaulH
The reason things like RSQ and LinEst won't work for this is because I need a universal measurement that includes vertical lines. As a line's slope approaches infinity (vertical), their RSQ approaches 0 even if the line is perfectly straight or nearly so.
-PaulH
Sounds like you are looking for R2, the coefficient of determinism.
Basically, you take the residual sum of squares, divide by the sum of squares and subtract from 1.
Use a Linear Regression. The "straightness" of the line is the R^2 value.
A value of 0 for the R^2 value implies it is perfectly straight. Increasing values imply increasing error in the regression, and thus the line is less and less "straight"
Could you try to catch the case of the vertical line before moving the least squares regression? If all x-values are the same, then the line is perfectly straight, no need to calculate an r^2 value.
Rough idea: 1. translate all coordinates to absolute values 2. calculate tan of current x/y 3. calculate tan of difference in x/y between current x/y and next x/y 4. difference in tan can give running deviation
Yes, use ordinary least squares method. Just use the Slope and Intercept functions in a worksheet. I expect there is a simple way to call these from the VBA codebehind.
Here's the VBA info. for R-Squared: http://www.pcreview.co.uk/forums/thread-1009945.php
精彩评论