+
+- Poor recognition quality after 100 iterations, as the network structure is apparently inappropriate. Having only 16 neurons in the first hidden layer makes the network not able to "see enough details" in the input. So let's use 128 neurons in the first hidden layer, and drop the second layer (it only seems to make things worse for me). Naturally a single training round takes much longer now, but voilĂ , after only 20 learning iterations it's already quite respectable:
+```
+$ time ./train.py
+correctly recognized images after initialization: 9.8%
+cost after training round 0: 0.44518003660592853
+[...]
+cost after training round 19: 0.10783488337150668
+correctly recognized images after training: 89.09%
+
+real 0m47.603s
+user 2m12.141s
+```
+
+ - And after 100 iterations, the accuracy improves even more, and the classification of the first test image looks reasonable:
+```
+cost after training round 99: 0.043068345296584126
+correctly recognized images after training: 94.17%
+
+output vector of first image: [1.11064478e-02 5.59058012e-03 5.40483856e-02 7.93664914e-02
+ 2.22662031e-03 3.50355065e-03 2.57506703e-04 9.60761429e-01
+ 2.68869803e-03 5.26559410e-03]
+
+
+real 4m10.904s
+user 11m21.203s
+```