README.md

   1 # Resources
   2
   3 Basics:
   4  - [Learn numpy](https://numpy.org/learn/)
   5  - [MNIST database of handwritten digits](http://yann.lecun.com/exdb/mnist/)
   6  - [Neuron](https://en.wikipedia.org/wiki/Artificial_neuron)
   7  - [Perceptron](https://en.wikipedia.org/wiki/Perceptron)
   8  - [Backpropagation](https://en.wikipedia.org/wiki/Backpropagation)
   9  - [Understanding & Creating Neural Networks with Computational Graphs from Scratch](https://www.kdnuggets.com/2019/08/numpy-neural-networks-computational-graphs.html)
  10  - [3Blue1Brown video series](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)
  11
  12 Too high-level for first-time learning, but apparently very abstract and powerful for real-life:
  13  - [keras](https://keras.io/)
  14  - [tutorial how to recognize handwriting with keras/tensorflow](https://data-flair.training/blogs/python-deep-learning-project-handwritten-digit-recognition/)
  15
  16 # Dependencies
  17
  18     sudo dnf install -y python3-numpy python3-matplotlib
  19
  20 # Steps
  21
  22  - Do the [NumPy quickstart tutorial](https://numpy.org/devdocs/user/quickstart.html); example:
  23
  24 ```py
  25 import numpy as np
  26 import matplotlib.pyplot as plt
  27 grad = np.linspace(0,1,10000).reshape(100,100)
  28 plt.imshow(grad, cmap='gray')
  29 plt.show()
  30
  31 plt.imshow(np.sin(np.linspace(0,10000,10000)).reshape(100,100) ** 2, cmap='gray')
  32 # non-blocking does not work with QT_QPA_PLATFORM=wayland
  33 plt.show(block=False)
  34 plt.close()
  35 ```
  36
  37  - Get the handwritten digits training data with `./download-mnist.sh`
  38
  39  - Read the MNIST database into numpy arrays with `./read_display_mnist.py`. Plot the first ten images and show their labels, to make sure the data makes sense:
  40
  41    ![visualize training data](screenshots/mnist-visualize-training-data.png)
  42
  43  - Define the structure of the neural network: two hidden layers with parametrizable sizes. Initialize weights and biases randomly. This gives totally random classifications of course, but at least makes sure that the data structures and computations work:
  44
  45 ```
  46 $ ./train.py
  47 output vector of first image: [    0.         52766.88424917     0.             0.
  48  14840.28619491 14164.62850135     0.          7011.882333
  49      0.         46979.62976127]
  50 classification of first image: 1 with confidence 52766.88424917019; real label 5
  51 correctly recognized images after initialization: 10.076666666666668%
  52 ```
  53
  54  - Add backpropagation algorithm and run a first training round. This is slow, as expected:
  55  ```
  56  $ time ./train.py
  57 output vector of first image: [    0.         52766.88424917     0.             0.
  58  14840.28619491 14164.62850135     0.          7011.882333
  59      0.         46979.62976127]
  60 classification of first image: 1 with confidence 52766.88424917019; real label 5
  61 correctly recognized images after initialization: 10.076666666666668%
  62 round #0 of learning...
  63 ./train.py:18: RuntimeWarning: overflow encountered in exp
  64   return 1 / (1 + np.exp(-x))
  65 correctly recognized images: 14.211666666666666%
  66
  67 real    0m37.927s
  68 user    1m19.103s
  69 sys     1m10.169s
  70 ```
  71
  72  - This is way too slow. I found an [interesting approach](https://www.kdnuggets.com/2019/08/numpy-neural-networks-computational-graphs.html) that harnesses the power of numpy by doing the computations for lots of images in parallel, instead of spending a lot of time in Python on iterating over tens of thousands of examples. Now the accuracy computation takes only negligible time instead of 6 seconds, and each round of training takes less than a second:
  73 ```
  74 $ time ./train.py
  75 output vector of first image: [0.51452796 0.49736819 0.51415083 0.50027547 0.48447025 0.49759904
  76  0.52621162 0.48671402 0.517606   0.50214569]
  77 classification of first image: 6 with confidence 0.526211616929459; real label 7
  78 correctly recognized images after initialization: 7.75%
  79 cost after training round 0: 1.0462266880961681
  80 [...]
  81 cost after training round 99: 0.4499245817840479
  82 correctly recognized images after training: 11.35%
  83
  84 real    1m51.520s
  85 user    4m23.863s
  86 sys     2m31.686s
  87 ```
  88
  89 - Poor recognition quality after 100 iterations, as the network structure is apparently inappropriate. Having only 16 neurons in the first hidden layer makes the network not able to "see enough details" in the input. So let's use 128 neurons in the first hidden layer, and drop the second layer (it only seems to make things worse for me). Naturally a single training round takes much longer now, but voilà, after only 20 learning iterations it's already quite respectable:
  90 ```
  91 $ time ./train.py
  92 correctly recognized images after initialization: 9.8%
  93 cost after training round 0: 0.44518003660592853
  94 [...]
  95 cost after training round 19: 0.10783488337150668
  96 correctly recognized images after training: 89.09%
  97
  98 real    0m47.603s
  99 user    2m12.141s
 100 ```
 101
 102  - And after 100 iterations, the accuracy improves even more, and the classification of the first test image looks reasonable:
 103 ```
 104 cost after training round 99: 0.043068345296584126
 105 correctly recognized images after training: 94.17%
 106
 107 output vector of first image: [1.11064478e-02 5.59058012e-03 5.40483856e-02 7.93664914e-02
 108  2.22662031e-03 3.50355065e-03 2.57506703e-04 9.60761429e-01
 109  2.68869803e-03 5.26559410e-03]
 110
 111
 112 real    4m10.904s
 113 user    11m21.203s
 114 ```
 115
 116 - Replace [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) activation function with [reLU](https://en.wikipedia.org/wiki/Rectifier_%28neural_networks%29). Some interesting effects, like a learning rate of 1 leads to "overshooting", and the cost function actually _increases_ during the learning steps several  times, and the overall result was worse. Changing the learning rate to linearly fall during the training rounds helps. But in the end, the result is still worse:
 117 ```
 118 cost after training round 99: 0.07241763398153217
 119 correctly recognized images after training: 92.46%
 120 output vector of first image: [0.         0.         0.         0.         0.         0.
 121  0.         0.89541759 0.         0.        ]
 122 classification of first image: 7 with confidence 0.8954175907939048; real label 7
 123 ```