Add reLU layer implementation This requires normalizing the input data to [0,1], otherwise the data gets wildly out of range. But normalizing the input range makes Sigmoid worse, so don't do this by default. Even with normalization, reLU still performs slightly worse than Sigmoid, though.
Process many images in parallel Provide one object per NN layer and implement their functionality separately, like in https://www.kdnuggets.com/2019/08/numpy-neural-networks-computational-graphs.html Each layer does not take only one image vector, but a whole 10,000 of them, which massively speeds up the computation -- much less time spent in Python iterations.
Initial Neural network with forward feeding Two hidden layers with parametrizable size. Two possible transfer functions, defaulting to reLU for now. Initialize weights and biases randomly. This gives totally random classifications of course, but at least makes sure that the data structures and computations work. Also already add a function to recognize the test images and count correct ones. Without trainingh, 10% of the samples are expected to be right by pure chance.