Add reLU layer implementation

author Martin Pitt <martin@piware.de>

Sun, 30 Aug 2020 08:20:14 +0000 (10:20 +0200)

committer Martin Pitt <martin@piware.de>

Sun, 30 Aug 2020 09:40:28 +0000 (11:40 +0200)
author Martin Pitt <martin@piware.de>
Sun, 30 Aug 2020 08:20:14 +0000 (10:20 +0200)
committer Martin Pitt <martin@piware.de>
Sun, 30 Aug 2020 09:40:28 +0000 (11:40 +0200)
diff --git a/README.md b/README.md

index d48ebf3bc0fd713a836bbb238992941d5504f7aa..b309254a3a2a67284c24db6f1af08e704bc8ffaf 100644 (file)
--- a/README.md
+++ b/README.md
@@ -112,3 +112,12 @@ output vector of first image: [1.11064478e-02 5.59058012e-03 5.40483856e-02 7.93
  real   4m10.904s
  user   11m21.203s
  ```
+
+- Replace [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) activation function with [reLU](https://en.wikipedia.org/wiki/Rectifier_%28neural_networks%29). Some interesting effects, like a learning rate of 1 leads to "overshooting", and the cost function actually _increases_ during the learning steps several  times, and the overall result was worse. Changing the learning rate to linearly fall during the training rounds helps. But in the end, the result is still worse:
+```
+cost after training round 99: 0.07241763398153217
+correctly recognized images after training: 92.46%
+output vector of first image: [0.         0.         0.         0.         0.         0.
+ 0.         0.89541759 0.         0.        ]
+classification of first image: 7 with confidence 0.8954175907939048; real label 7
+```
diff --git a/nnet.py b/nnet.py

index ddf8ba68b948bfda759c4d8b5696f0efd6c39559..239c96ba24bdccb491b165044fb25f1885f8785f 100644 (file)
--- a/nnet.py
+++ b/nnet.py
@@ -59,6 +59,20 @@ class SigmoidLayer:
          return upstream_grad * self.A * (1 - self.A)
  
  
+class reLULayer:
+    def __init__(self, shape):
+        self.shape = shape
+
+    def forward(self, Z):
+        assert Z.shape == self.shape
+        self.A = np.maximum(Z, 0)
+        return self.A
+
+    def backward(self, upstream_grad, learning_rate=0.1):
+        # couple upstream gradient with local gradient, the result will be sent back to the Linear layer
+        return upstream_grad * np.heaviside(self.A, 1)
+
+
  def label_vectors(labels, n):
      y = np.zeros((n, labels.size))
      for i, l in enumerate(labels):
author	Martin Pitt <martin@piware.de>
	Sun, 30 Aug 2020 08:20:14 +0000 (10:20 +0200)
committer	Martin Pitt <martin@piware.de>
	Sun, 30 Aug 2020 09:40:28 +0000 (11:40 +0200)
README.md		patch \| blob \| history
nnet.py		patch \| blob \| history