Add reLU layer implementation

[handwriting-recognition.git] / README.md
diff --git a/README.md b/README.md

index fa1c48496f16f54aaba8b1d40f637dc5db10e0b4..b309254a3a2a67284c24db6f1af08e704bc8ffaf 100644 (file)
--- a/README.md
+++ b/README.md
@@ -5,6 +5,8 @@ Basics:
   - [MNIST database of handwritten digits](http://yann.lecun.com/exdb/mnist/)
   - [Neuron](https://en.wikipedia.org/wiki/Artificial_neuron)
   - [Perceptron](https://en.wikipedia.org/wiki/Perceptron)
+ - [Backpropagation](https://en.wikipedia.org/wiki/Backpropagation)
+ - [Understanding & Creating Neural Networks with Computational Graphs from Scratch](https://www.kdnuggets.com/2019/08/numpy-neural-networks-computational-graphs.html)
   - [3Blue1Brown video series](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)
  
  Too high-level for first-time learning, but apparently very abstract and powerful for real-life:
@@ -37,3 +39,85 @@ plt.close()
   - Read the MNIST database into numpy arrays with `./read_display_mnist.py`. Plot the first ten images and show their labels, to make sure the data makes sense:
  
     ![visualize training data](screenshots/mnist-visualize-training-data.png)
+
+ - Define the structure of the neural network: two hidden layers with parametrizable sizes. Initialize weights and biases randomly. This gives totally random classifications of course, but at least makes sure that the data structures and computations work:
+
+```
+$ ./train.py
+output vector of first image: [    0.         52766.88424917     0.             0.
+ 14840.28619491 14164.62850135     0.          7011.882333
+     0.         46979.62976127]
+classification of first image: 1 with confidence 52766.88424917019; real label 5
+correctly recognized images after initialization: 10.076666666666668%
+```
+
+ - Add backpropagation algorithm and run a first training round. This is slow, as expected:
+ ```
+ $ time ./train.py
+output vector of first image: [    0.         52766.88424917     0.             0.
+ 14840.28619491 14164.62850135     0.          7011.882333
+     0.         46979.62976127]
+classification of first image: 1 with confidence 52766.88424917019; real label 5
+correctly recognized images after initialization: 10.076666666666668%
+round #0 of learning...
+./train.py:18: RuntimeWarning: overflow encountered in exp
+  return 1 / (1 + np.exp(-x))
+correctly recognized images: 14.211666666666666%
+
+real   0m37.927s
+user   1m19.103s
+sys    1m10.169s
+```
+
+ - This is way too slow. I found an [interesting approach](https://www.kdnuggets.com/2019/08/numpy-neural-networks-computational-graphs.html) that harnesses the power of numpy by doing the computations for lots of images in parallel, instead of spending a lot of time in Python on iterating over tens of thousands of examples. Now the accuracy computation takes only negligible time instead of 6 seconds, and each round of training takes less than a second:
+```
+$ time ./train.py
+output vector of first image: [0.51452796 0.49736819 0.51415083 0.50027547 0.48447025 0.49759904
+ 0.52621162 0.48671402 0.517606   0.50214569]
+classification of first image: 6 with confidence 0.526211616929459; real label 7
+correctly recognized images after initialization: 7.75%
+cost after training round 0: 1.0462266880961681
+[...]
+cost after training round 99: 0.4499245817840479
+correctly recognized images after training: 11.35%
+
+real   1m51.520s
+user   4m23.863s
+sys    2m31.686s
+```
+
+- Poor recognition quality after 100 iterations, as the network structure is apparently inappropriate. Having only 16 neurons in the first hidden layer makes the network not able to "see enough details" in the input. So let's use 128 neurons in the first hidden layer, and drop the second layer (it only seems to make things worse for me). Naturally a single training round takes much longer now, but voilà, after only 20 learning iterations it's already quite respectable:
+```
+$ time ./train.py
+correctly recognized images after initialization: 9.8%
+cost after training round 0: 0.44518003660592853
+[...]
+cost after training round 19: 0.10783488337150668
+correctly recognized images after training: 89.09%
+
+real   0m47.603s
+user   2m12.141s
+```
+
+ - And after 100 iterations, the accuracy improves even more, and the classification of the first test image looks reasonable:
+```
+cost after training round 99: 0.043068345296584126
+correctly recognized images after training: 94.17%
+
+output vector of first image: [1.11064478e-02 5.59058012e-03 5.40483856e-02 7.93664914e-02
+ 2.22662031e-03 3.50355065e-03 2.57506703e-04 9.60761429e-01
+ 2.68869803e-03 5.26559410e-03]
+
+
+real   4m10.904s
+user   11m21.203s
+```
+
+- Replace [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) activation function with [reLU](https://en.wikipedia.org/wiki/Rectifier_%28neural_networks%29). Some interesting effects, like a learning rate of 1 leads to "overshooting", and the cost function actually _increases_ during the learning steps several  times, and the overall result was worse. Changing the learning rate to linearly fall during the training rounds helps. But in the end, the result is still worse:
+```
+cost after training round 99: 0.07241763398153217
+correctly recognized images after training: 92.46%
+output vector of first image: [0.         0.         0.         0.         0.         0.
+ 0.         0.89541759 0.         0.        ]
+classification of first image: 7 with confidence 0.8954175907939048; real label 7
+```