Enhancing the Performance of the BackPropagation for Deep Neural Network

The standard Backpropagation Neural Network (BPNN) Algorithm is widely used in solving many real problems in world. But the backpropagation suffers from different difficulties such as the slow convergence and convergence to local minima. Many modifications have been proposed to improve the performance of the algorithm such as careful selection of initial weights and biases, learning rate, momentum, network topology and activation function. This paper will illustrate a new additional version of the Backpropagation algorithm. In fact, the new modification has been done on the error signal function by using deep neural networks with more than one hidden layers. Experiments have been made to compare and evaluate the convergence behavior of these training algorithms with two training problems: XOR, and the Iris plant classification. The results showed that the proposed algorithm has improved the classical Bp in terms of its efficiency.

A variety of researches have been applied to accelerate the learning process and improve the training efficiency [3, 4, 5, and 6]. An OBP algorithm is designed to overcome some of the problems associated with standard BP training using nonlinear function, which applied on the output units to escape from local minima with high speed of convergence during the training period. [7] This paper presents an extended version of an Optical Backpropagation (OBP) algorithm. The proposed algorithm improve the performance of the Optical Backpropagation algorithm (OBP) on deep neural network, the experimental results show that the proposed algorithm converges to a reasonable range of error after a few number of training epochs.

STANDARD BACKPROPAGATION (BP)
The Backpropagation BP is designed to minimize the mean square error between the actual output and the desired output . For a given set of input patterns applied to the first layer in the neural network, it propagated through each upper layer until an output is generated. This output is then compared to the known and desired output and the error value is calculated. Based on the error, the connection weights are adjusted backward from the output layer to each unit in the network.
Algorithm for a 3-layer network with m input units, n hidden units, and p output units can be described as follows [2,7]

Extended Optical Bp (EOBP)
The difficulty encountered in the standard Backpropagation algorithm is when the actual value approaches either extreme value, the factor in equation (2.6) makes the error signal very small. This implies that an output unit can be maximally wrong without producing a strong error signal with which the coupling strengths could be significantly adjusted. [7] The Optical Backpropagation algorithm (an enhanced Backpropagation) focused on this delay of the convergence that is caused by the derivative of the activation function.
The convergence speed of the training process was improved significantly by OBP through maximizing the error signal, which was transmitted backward from the output layer to each unit in the intermediate layer.
The error at a single output unit in adjusted OBP is defined as [6], [7]: Where the subscript "P" refers to the pth training vector, and "K "refers to the kth output unit. In this case, Ypk is the desired output value, and Opk is the actual output from kth unit, then δpk will propagate backward to update the output-layer weights and the hidden-layer weights.
The error signal will minimize the errors of each output unit more quickly than the Backpropagation error signal and so the weights on certain units change very large from their starting values. [7] The error function defined in Optical Backpropagation is proportional to the square of the distance between the desired output and the actual output of the network for a particular input pattern.
As an alternative, any other error functions whose derivatives exist and can be calculated at the output layer can replace the traditional square error criterion [4]. In this paper, a new error function had been adopted to replace the error function used in Optical Backpropagation. The equations of the new function are given as:

The EOBP Steps
The EOBP steps are described as follows: 1. Apply the input example to the input units. xp = (xp1, xp2,... xpn) 2. Calculate the net-input values to the hidden layer units. 3. Calculate the outputs from the hidden layer. 4. Calculate the net-input values to the output layer units. 5. Calculate the outputs from the output units. 6. Calculate the error term for the output units, using the Newδ˚pk described in equations (3.4), (3.5) and (3.6) 7. Calculate the error term for the hidden units, through applying Newδ˚pk. 8. Update weights on the output layer. 9. Update weights on the hidden layer. 10. Repeat steps from step 1 to step 9 until the error (Ypk -Opk) is acceptably small for each training vector pairs. The proposed algorithm stops as the standard Bp when the squares of the differences between the actual and target values summed over the output units at all patterns are relatively small.
The new error function gives a rapid reaction to changes in the weights value by increasing the speed with less number of iterations and without loss of learn-ability.
The new algorithm was tested on different neural network architecture, with one or more hidden layers.
A deep neural network (DNN) is a feed-forward, artificial neural network that has more than one layer of hidden units between its inputs and its outputs. Each hidden unit typically uses the logistic function which was the hypertan function to map its total input from the layer below, then to sends them to the layer above. The output unit then converts its total input to produce the output, by using the "softmax" non-linearity.
In DNNs with full connectivity between adjacent layers, the initial weights are given small random values to prevent all of the hidden units in a layer from getting exactly the same gradient [10].
The normalization is important because of the multiplicative effect through layers, to maintain the activation variances and back-propagated gradients variance as one move up or down the network [10].
DNNs with many hidden layers and many units per layer are very flexible model. This makes them capable of modeling very complex and highly non-linear relationships between inputs and outputs, such as the acoustic modeling.
DNN's can be discriminatively trained by the EOBP to measures the difference between the target outputs and the actual outputs produced for each training case.
At the beginning of the training, the back-propagated gradients gets smaller as it is propagated downwards using the New error signal which minimize the errors of each output unit faster than the old one, and the weights on certain units change relatively large from their starting values. Weights are either increased or decreased according to the sign of the term (Y -O).

XOR Problem (XOR 2-2-1)
The XOR problem will be solved using neural network which consists of two input units, two hidden units, and single output unit, with biases for hidden unit and the output unit, without direct connection from input to the output layers. The architecture of this network is shown in figure 4.1.   The results of the training processes using the EOBP and BP algorithms will be explained in table 4.2 and figure 4.2.

Iris Plant Classification
The iris plant classification problem is a well-known benchmark problem concerning classification of flowers. Three iris flowers classes are known: Iris setosa, Iris versicolor and Iris virginica. The classification is based on four leaf attributes namely, sepal length and width, and petal length and width. These attributes denoted by x1, x2, x3, and x4 were measured in millimeters and collected in the Iris database which consists of 150 items as shown below. There are 50 data points for each species. Because neural networks work with numeric data, the categorical species information must be converted to numeric data. When performing neural network classification, the classes to be predicted is stored directly in 1-of-N encoded dependent-variable data located in the last three columns: The experiments starts by splitting the data set, which consists of 150 items, into a training set of 120 items (80 percent) and a test set of 30 items (20 percent). Next, the experiments create a neural network with four input nodes (one for each numeric input), seven hidden nodes and three output nodes (one for each possible output class). The neural network's weights and bias values are initialized to small (between 0.001 and 0.0001) random values. As the weights and biases determine the output values for a given set of input values, the two algorithms (EOBP and BP) are used to search for weights and bias values that generate neural network outputs that most closely match the output values in the training data then the two results are compared to check the performance of each algorithm. Table 4.3 shows the parameters to solve the Iris plant classification problem using the BP and EOBP algorithms. The 4-7-3 neural network will have 4*7 + 7*3 = 49 weights and 7+3 = 10 biases. The initial weights selected randomly, and the same initial weights have been used for the two algorithms. After the neural network has been trained, result displays the final weights that were determined by the training process.   After a neural network has been trained, the prediction accuracy on the training data set and the prediction accuracy on the test data are computed and shown in table 4.5.  The result shows that the prediction accuracy on the training data set and the prediction accuracy on the test data for the EOBP are larger than the prediction accuracy on the training data set and the prediction accuracy on the test data for the BP.

Iris Plant Classification (Deep Learning)
The next problem to be described is the Iris plant classification problem with different neural network architecture. The network consists of 4 units in the input layer, and 3 units in the output layer, and two hidden layers with 4 units for the first hidden layer and 8 units for the second hidden layer. And use the softmax logistic function for the output layer and the hypertan function for the hidden layers. In this experiment, the EOBP was tested to solve it. The results show that EOBP needed 1000 epochs to train the network, which is less that number of epochs needed to train the network using the standard BP which equals to 2120.

Compare Two Results Using Different Neural Network Architecture
For a neural network with 3 units for input layer, 3 hidden layers with 3, 4 and 7 nodes for each respectively, and 2 units for output layer, the final weights from input to the first hidden layer and from the first hidden layer to the second hidden after using the EOBP and the BP are summarized in table 4.7.
Training is discontinued when the MSE falls below 0.00001. The initial weights selected randomly, and the same initial weights have been used for the two algorithms, the Learning rate was set to 0.5 and the momentum value was equal to 0.1. Figure 4.4 shows that the differences between the final weights from input layer to the first hidden layer and from the first hidden layer to the second hidden layer using an EOPB and BP are very small.        The remarkable result obtained from previous figure is the number of epochs using the two algorithms, which proves that the EOBP gives a better result compared with the standard BP with a fewer number of iterations.
In addition, this network was tested using the two algorithms with different learning rate, Table 4.9, shows the results: The results of EOBP are much faster than the BP for all training processes with different learning rate.

CONCLUSION
The Back-propagation Neural Network (BPNN) is a supervised learning neural network model highly applied in different applications around the globe. Although it is widely implemented in the most practical ANN applications and performs relatively well, it is suffering from a problem of slow convergence and convergence to local minima and there still exist areas where improvements can be made. This makes Artificial Neural Network's application very challenging when dealing with large problems. This paper introduced an extended version of the Optical Backpropagation algorithm, EOBP, for training the Deep Neural Network in order to improve the learning speed.
The EOBP is an enhanced version of the Optical Backpropagation algorithm. The study shows that EOBP is beneficial in speeding up the learning process by using a modified error function which enhances the whole training process in terms of requiring less number of epochs to converge.
The characteristics of the EOBP are:  Able to reach a very small MSE O c t o b e r 3 1 , 2 0 1 4  Work with deep neural network with more than one hidden layers  Work with multilayer neural networks with one hidden layer  Work with a small value for the learning rate  Work with biases  Work with small range for the initial weights  can perform well for small training samples The effectiveness of the EOBP algorithm has been compared with the standard BP algorithm and verified by means of simulation on two real problems. The experimental results show that the EOBP algorithm converges to a reasonable range of error after a few number of training epochs and this enforce the usage of it as alternative training algorithm of standard BP for Deep Neural Network.