I have implemented a multilayer perceptron to predict the sin of input vectors. The vectors consist of four -1,0,1's chosen at random and a bias set to 1. The network should predict the sin of sum of the vectors contents.
eg Input = <0,1,-1,0,1> Output = Sin(0+1+(-1)+0+1)
The problem I am having is that the network will never predict a negative value and many of the vectors' sin values are negative. It predicts all positive or zero outputs perfectly. I am presuming that there is a problem with updating the weights, which are updated after every epoch. Has anyone encountered this problem with NN's before? Any help 开发者_Python百科at all would be great!!
note: The network has 5inputs,6hidden units in 1 hidden layer and 1 output.I am using a sigmoid function on the activations hidden and output layers, and have tried tonnes of learning rates (currently 0.1);
Being a long time since I looked into multilayer perceptrons hence take this with a grain of salt.
I'd rescale your problem domain to the [0,1] domain instead of [-1,1]. If you take a look at the logistic function graph:
It generates values between [0,1]. I do not expect it to produce negative results. I might be wrong, tough.
EDIT:
You can actually extend the logistic function to your problem domain. Use the generalized logistic curve setting A and K parameters to the boundaries of your domain.
Another option is the hyperbolic tangent, which goes from [-1,+1] and has no constants to set up.
There are many different kinds of activation functions, many of which are designed to output a value from 0 to 1. If you're using a function that only outputs between 0 and 1, try adjusting it so that it outputs between 1 and -1. If you were using FANN I would tell you to use the FANN_SIGMOID_SYMMETRIC activation function.
Although the question has already been answered, allow me to share my experience. I have been trying to approximate Sine function using a 1--4--1 neural network. i.e,
And similar to your case, I am not allowed to use any high level API like TensorFlow. Moreover I am bound to use C++ over Python3! (BTW, I mostly prefer C++).I used Sigmoid activation and its derivative defined as:
double sigmoid(double x)
{
return 1.0f / (1.0f + exp(-x));
}
double Sigmoid_derivative(double x)
{
return x * (1.0f - x);
}
And this is what I got after 10,000 epochs, training the network on 20 Training Examples.
As, you can see, the network didn't feel like the negative curve. So, I changed the activation function to Tanh.
double tanh(double x)
{
return (exp(x)-exp(-x))/(exp(x)+exp(-x));
}
double tanh_derivative(double x)
{
return 1.0f - x*x ;
}
And surprisingly, after half the epochs, (i.e., 5000), I got a far better curve.
And we all know that it will significantly improve on using more hidden neurons, more epochs and better (and more) training example. Also, shuffling the data is important too!
精彩评论