In neural networks, non-linear activation functions are used to define layer outputs, and are one of the hyperparameters we get to choose. There are 3 common ones in use today: sigmoid, tanh and ReLU.


A rectified linear unit. It’s output is 0 when x is below 0.

$$ relu(x) = max(0, x)$$
Figure 1: The ReLU function.

Here's the Python implementation.


The sigmoid function – named due to the characteristic S shape that it graphs.

$$ sigmoid(x) = { 1 \over 1 + e^{-x}}$$
Figure 2: The sigmoid function.

Here it is in Python.

Hyperbolic Tangent

The hyperbolic tangent function. Can be regarded as modified version of sigmoid.

$$ tanh(x) = {e^x - e^{-x} \over e^x + e^{-x}}$$
Figure 3: The tanh function.

Here's the Python implementation.