I have thousands of data sets of (p, w, x, y, z), and I'm pretty sure that they fit an equation of the form p= aw + b开发者_开发技巧x + cy + dz, with p rounded up in some way.
I'd like to write a program to solve for the constants a, b, c, and d, given all the data sets for the variables and result that I have. Alternatively, if there is software that can already do this, that would be great. Any suggestions, or any Google keywords I can use to do further research?
Notice that, if the points fit the equation perfectly, then you only need four data to determine the parameters and not "thousand". The remaining points either fit the equations (hence are redundant) or cannot be made to fit the equation (i.e. your problem is impossible)
If instead the fit that you're looking for is not necessarily perfect and what you need is to find the parameters a,b,c,d that are the optimal fit (i.e. minimize square errors), then what you need is a linear regression.
Please notice that each of the equations that define one of your datapoint can be written in the form
Ax = B
where A are row-vectors of 4 values and x is a column-vector of 4 values. For this reason,
- the vector A summarizes the info that in your writing is carried by the tuple (a, b, c, d)
- the vector x summarizes the info that in your writing is carried by the tuple (w, x, y, z).
- B is, then, a scalar.
At this point you may google for "linear regression" and apply the knowledge. :) There are several software packages to do this, like matlab, octave, but probably even Excel can do it. :)
What you are looking for is a simultaneous linear equation solver. I would recommend googling for an implementation in the language of your choice.
If I understood your question, in linear algebra, when you have a linear system consisting of n unknowns, you need at least n equations to solve it for every unknown.
If you have m equations and n unknowns and m < n, you'll have (n-m) free variables.
If m > n, you might have an impossible system, depending the system itself.
So, with thousands of (p, w, x, y, z)s, you can have another thousands (or even thousands of thousands) of solution combinations.
Hope it helps.
If you have thousands of data points q i = (pi, wi, xi, yi, zi), i = 1..n, and only 4 unknowns (a,b,c,d), then you have an overconstrained linear system, which is unlikely to have a unique solution.
In general, you need to solve a system that looks like
[ w1 x1 y1 d1 ] [ a ] [ p1 ]
[ w2 x2 y2 d2 ] [ b ] = [ p2 ]
[ . ] [ c ] [ p3 ]
[ . ] [ d ] [ . ]
[ . ] 4 x 1 [ . ]
[ wn xn yn dn ] [ . ]
[ pn ]
n x 4 n x 1
In such a case you need to get a "best approximation" to the solution, because there won't be a unique one. An example of that would be a least squares approximation.
Thanks for the responses. This gets me started.
The p values that I have are rounded. And I think that the a, b, c, and d values are, in the original formula that I am trying to recover, fractions with integral numerators and denominators. So what I'm thinking is that I need more than four sets of values to help filter out the rounding, and then as I close in on the probable values, I'll try to figure out the mostly likely fractional equivalent to the decimal values I get, with the lowest integral denominators. Which leads me to ask, is there an algorithm/program to convert decimals to the simplest possible fractions within a specified range of error?
I would suggest solving this via linear regression.
Basically, the idea behind linear regression is that you have a matrix of independent variables (X) and you want to use them to predict a vector of dependent variables y.
There is a closed-form solution for this -- called the "normal equation."
You can perform linear regression in any language. You can even do it in Excel.
Here is a tutorial that describes how to perform linear regression in Octave: http://www.lauradhamilton.com/tutorial-linear-regression-with-octave
精彩评论