I'd like to approximate the ex function.
Is it possible to do so using multiple splines type based approach? i.e between x1 and x2, then
y1 = a1开发者_开发知识库x + b1, between x2 and x3,
then
y2 = a2x + b2
etc
This is for dedicated fpga hardware and not a general purpose CPU. As such I need to create the function myself. Accuracy is much less of a concern. Furthermore I can't really afford more than one multiplication circuit and/or multiple shifts/adders. Also I want something much smaller than a CORDIC function, in fact size is critical.
How about a strategy like this that uses the formula
ex = 2 x/ln(2)
- Precalculate
1/ln(2)
- Multiply this constant by your argument (1 multiplication)
- Use binary shifts to raise 2 to the integer portion of the power (assumes exp+mantissa format)
- Adjust based on the fractional power-of-2 remainder (likely a second multiplication)
I realize this is not a complete solution, but it does only require a single multiplication and reduces the remaining problem to approximating a fractional power of 2, which should be easier to implement in hardware.
Also, if your application is specialized enough, you could try to re-derive all of the numerical code that will run on your hardware to be in a base-e number system and implement your floating point hardware to work in base e as well. Then no conversion is needed at all.
If x
is an integer, you can just multiply e
by itself over and over again.
If x
is not an integer, you can calculate the efloor(x) using the above method and then multiply by a small correction term. This correction term can be easily calculated using a number of approximation methods. One such way is this:
ef ≈
1 + f(1 + f/2(1 + f/3(1 + f/4)))
, where f is the fractional part of x
This comes from the (optimized) power series expansion of ex, which is very accurate for small values of x
. If you need more accuracy, just tack on more terms to the series.
This math.stackexchange question contains some additional clever answers.
EDIT: Note that there is a faster way of calculating en called exponentiation by squaring.
First off, what is motivating this approximation? In other words, what exactly is wrong with the straightforward exp(x)
?
That said, a typical implementation of exp(x)
is to
- Find an integer
k
and floating point numberr
such thatx=k*log(2) + r
andr
is between -0.5*log(2) and 0.5*log(2). - With this reduction,
exp(x)
is 2k*exp(r)
. - Calculating 2k is a snap.
- The standard implementations of
exp(x)
use a Remes-type algorithm to come up with a minimax polynomial that approximatesexp(r)
. - You could do the same, but use a reduced order polynomial.
Here's the kicker: No matter what you do the odds are very high that your function will be much, much slower than just calling exp()
. Most of the functionality of exp()
is implemented in your computer's math coprocessor. Re-implementing that functionality in software, even with reduced precision, is going to be an order of magnitude slower than just using exp()
.
For hardware, I have an awesome solution for you IF you need it to be bit-level accurate. (Else just do an approximation like above). The identity is exp(x) = cosh(x) + sinh(x), the hyperbolic sine and cosine. The catch is that the hyperbolic sine and cosine can be computed using the CORIC technique, and best of all, they are one of the FAST CORDIC functions, meaning they look almost like multiply instead of almost like divide!
Which means for about the area of an array multiplier, you can compute exponent to arbitrary precision in just 2 cycles!
Look up the CORDIC method - it's AMAZING for hardware implementation.
One other hardware approach is using a small table in conjunction with a formula others have mentioned: exp(x + y) = exp(x) * exp(y). You can break the number up into small bit fields - say 4 or 8 bits at a time - and just look up the exponent for that bitfield. Probably only effective for narrow computations, but it's another approach.
http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/ using Schraudolph's method (http://nic.schraudolph.org/pubs/Schraudolph99.pdf) in Java:
public static double exp(double val) {
final long tmp = (long) (1512775 * val) + (1072693248 - 60801);
return Double.longBitsToDouble(tmp << 32);
}
and https://math.stackexchange.com/a/56064 (look for Pade approximant).
This is not the smooth spline interpolation you requested but its computationally efficient:
float expf_fast(float x) {
union { float f; int i; } y;
y.i = (int)(x * 0xB5645F + 0x3F7893F5);
return (y.f);
}
Plot Output
Of course it is "possible". There are several issues.
What is your requirement for the accuracy?
Are you willing to use higher order splines?
How much memory are you willing to spend on this? Linear function over small enough intervals will approximate the exponential function to any degree of accuracy needed, but it may require a VERY small interval.
Edit:
Given the additional information provided, I ran a quick test. Range reduction can always be used on the exponential function. Thus, if I wish to compute exp(x) for ANY x, then I can rewrite the problem in the form...
y = exp(xi + xf) = exp(xi)*exp(xf)
where xi is the integer part of x, and xf is the fractional part. The integer part is simple. Compute xi in binary form, then repeated squarings and multiplications allow you to compute exp(xi) in relatively few operations. (Other tricks, using powers of 2 and other intervals can give you yet more speed for the speed hungry.)
All that remains is now to compute exp(xf). Can we use a spline with linear segments to compute exp(xf), over the interval [0,1] with only 4 linear segments, to an accuracy of 0.005?
This last question is resolved by a function that I wrote a few years ago, that will approximate a function with a spline of a given order, to within a fixed tolerance on the maximum error. This code required 8 segments over the interval [0,1] to achieve the required tolerance with a piecewise linear spline function. If I chose to reduce the interval further to [0,0.5], I could now achieve the prescribed tolerance.
So the answer is simple. If you are willing to do the range reductions to reduce x to the interval [0.0.5], then do the appropriate computations, then yes you can achieve the requested accuracy with a linear spline in 4 segments.
In the end, you will always be better off using a hard coded exponential function though. All of the operations mentioned above will surely be slower than what your compiler will provide, IF exp(x) is available.
This is not appropriate for custom FPGA, but worth mentioning.
http://www.machinedlearnings.com/2011/06/fast-approximate-logarithm-exponential.html
And the source code:
https://code.google.com/archive/p/fastapprox/downloads
The "faster" implementation only involves 3 steps (multiply, add, convert float to int) and a final cast back to float. In my experience, it is 2% accurate, which may be enough if you don't care about the actual value but are using the value in a log-likelihood maximization iteration.
Wolfram presents a few good ways of approximating it in terms of series etc:
- Wolfram page for ex
Wikipedias page on Taylor Series also shows an example of an expansion of ex around 0:
Or you could just do pow(M_E, x)
in C. (Some platforms don't have M_E
defined; on those, you may have to manually specify the value of e, which is approximately 2.71828182845904523536028747135266249775724709369995
.)
(As David points out in the comments, exp(x)
would be more efficient than pow(M_E, x)
. Again, brain not turned on yet.)
Do you have a use case where the calculation of ex is a proven bottleneck? If not, you should be coding for readability first; only try these sorts of optimizations if the obvious approach is too slow.
精彩评论