Guide to Non-Linear Activation Functions in Deep Learning

Non-linear activation functions that you need to know

Pralabh Saxena
Heartbeat

--

Photo by Ricardo Gomez Angel on Unsplash

Activation functions are equations or mathematical formulas that help us determine the output of a neural network. The main purpose of an activation function is to derive an output value based on the input value that is fed to a neuron.

In deep learning, activation functions add non-linearity to the neural networks. By non-linearity, it refers to a situation where there is not a linear relationship between the input data and the output data that can be represented by a simple linear equation.

Now that we know why activation functions are important for neural networks, let’s get to know about various types of non-linear activation functions that are widely used in neural networks.

Let’s get started!

Types of non-linear activation functions

Non-linear activation functions are mathematical functions that produce output values that are not directly proportional to their respective input values. In other words, the output of a non-linear activation function is not a simple linear transformation of the input.

The non-linear activation functions are often used in neural networks as they allow the neural network to learn more complex relationships between the input value and the output value.

Some examples of non-linear activation functions are as follows:

Sigmoid activation function

The sigmoid function is a widely used activation function for binary classification. The graph of the sigmoid function is an S-shaped curve and the range of this activation function is between 0 and 1.

We can also use sigmoid activation function for multi-label classification as well. Multi-label classification is a scenario where the sample data belongs to multiple classes (such as classifying the movie genre with the help of movie poster, as the movie can have multiple genres).

A sigmoid activation function is majorly used at the final layer of a neural network that helps the neural network to convert the output of the model into a probability value.

Image Source: Wikipedia

The equation for this activation function can be represented as:

Source: Author

Where x is the output value of the particular neuron and e is Euler's number.

This activation function is widely used in machine learning models where we need to predict the probability of any input value as an output. It is because the probability of any event always exists between 0 and 1.

Example:

#Sigmoid activation function

import tensorflow as tf

# A constant vector
vec = tf.constant([1.0, -0.5, 3.4, -2.1, 0.0, -6.5], dtype = tf.float32)

# Applying the sigmoid function
out_vec = tf.nn.sigmoid(vec, name ='sigmoid')

tf.print('Input: ', vec)
tf.print('Output:', out_vec)

Output:

Here, you can see that the values in the output vector are in the range of 0 and 1.

Have you tried Comet? Sign up for free and easily track experiments, manage models in production, and visualize your model performance.

TanH activation function

The TanH function, which is also referred to as tangent hyperbolic function is another commonly used activation function in a neural network.

The range of this activation function is between -1 and 1. The output of this activation function is closer to 0 when there are zero inputs. The output of this activation function will be closer to 1 for the larger input values, whereas for the smaller input (negative value), the output will be closer to -1.

The equation for this activation function can be represented as:

Source: Author

Where x is the output value of the particular neuron and e is Euler’s number.

The properties of the TanH function are very similar to the sigmoid activation function, apart from its range (-1 to 1).

What makes the TanH activation function different from the Sigmoid activation function is its gradient. The gradient of the TanH function is greater compared to the gradient of the sigmoid activation function when the input values are centered around 0 (zero).

Therefore, we should consider using the TanH activation function over the sigmoid function, if we want to have a strong gradient and big learning rate in our neural network.

Example:

#Tanh activation function

import tensorflow as tf

# A constant vector
vec = tf.constant([-3.0 ,-1.0 ,0.0 ,1.0 ,3.0], dtype = tf.float32)

# Applying the tanh function
out_vec = tf.nn.tanh(vec, name ='tanh')

tf.print('Input: ', vec)
tf.print('Output:', out_vec)

Output:

Here, you can see that the output vector is in the range of -1 and 1.

ReLU activation function

ReLU activation function stands for Rectified Linear Units. It is the most commonly used activation function in the hidden layers of a neural network.

The range of this activation function is 0 to infinity.

Image source: Kaggle

The formula/equation for this activation function can be represented as:

f(x) = max(0,x)

If the input value x to this activation function is a negative number, then 0 is returned as output or we can say that the neuron does not get activated. And since only certain neurons get activated, this activation function is computationally efficient.

The advantage of the ReLU activation function over the sigmoid and TanH functions is that it mostly overcomes the vanishing gradient problem, which is often faced by the sigmoid and TanH functions.

One disadvantage of this activation function is that for x<0 (for input values that are negative), the gradient will be zero. If the weights in the neural network lead to negative input into another neuron with ReLU activation function, then that neuron will not be contributing to the neuron network.

Example:

#relu activation function

import tensorflow as tf

# A constant vector
vec = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)

# Applying the relu function
out_vec = tf.nn.relu(vec, name ='relu')

tf.print('Input: ', vec)
tf.print('Output:', out_vec)

Output:

Softmax activtion function

In neural networks, the Softmax function is used for multi-class classification. It calculates the probability distribution of an event over n different opportunities. This activation function transforms the output and converts them from weighted sum values into a probability distribution that sums to 1.

This activation function is used at the output layer of a neural network. The difference between Softmax and sigmoid activation functions is that in the Softmax function, the sum of output classes is 1, while the sigmoid activation function makes the output of each class between 0 and 1.

Example:

#softmax activation function

import tensorflow as tf

# A constant vector
vec = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)

# Applying the softmax function
out_vec = tf.nn.softmax(vec, name ='softmax')

tf.print('Input: ', vec)
tf.print('Output:', out_vec)
tf.print('Sum: ', sum(out_vec))

Output:

Here, you can see that the sum of the output probability vector is equal to 1.

This Softmax activation function calculates the probability of each target class over every target class and then the class with the highest probability is determined to be the output class for the given input.

Conclusion

That’s all from this article. In this article, we discussed various non-linear activation functions with their mathematical formulas. We also discussed the use case of the activation functions. It is essential to know which activation should be used in a particular use case.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

--

--