Hyperparameter Tuning

Sachin Dev
5 min readFeb 3, 2023

In this article, we will discuss hyperparameters, the importance of hyperparameters, and hyperparameter tuning. Before discussing hyperparameter tuning we should know about hyperparameters for better understanding.

Source: https://developer.nvidia.com/blog/sigopt-deep-learning-hyperparameter-optimization/

Hyperparameters and their importance?

If I have to define hyperparameters in a single sentence then I would say Hyperparameters decide how well the machine learning model performs.

  • Hyperparameters play a crucial role in determining the performance of a machine learning model.
  • Hyperparameters control aspects of the model’s behavior and training process, such as learning rate, number of iterations, and regularization strength which can significantly impact the final performance of the model on a given task.
  • Optimal Hyperparameter values can help the model generalize well to new data and avoid overfitting and underfitting.

Hyperparameter Tuning

It is a process of choosing the optimal values for a model’s hyperparameters to improve the model’s performance for the given task. The goal of hyperparameter tuning is to find the best combination of the hyperparameters that result in the highest performance for the model.

Example: Think of the ML model as a robot that you want to teach how to do a specific task, like recognizing animals. Suppose this robot has different buttons and switches that control how it performs the task, like how it learns or how many times it repeats the task to make sure it gets it right. These buttons and switches are like hyperparameters.
By changing the values of these buttons and switches, you can make the robot do the task better or worse. This is called Hyperparameter Tuning.

Difference between Parameters and Hyperparameter

When I first read about hyperparameters, I used to have trouble differentiating between parameters and hyperparameters but these are not the same. So, it’s important to know the difference between both.

  • Parameters are values that are learned by an ML model during the training process, while Hyperparameters are set prior to training and remain constant during the training process.
  • Parameters are updated by the learning algorithm during training, based on the training data and optimization algorithm, while hyperparameters are set by the practitioner and are not learned from the data.
  • Parameters represent the parameters of the learned model, such as the weights and biases of a neural network, while hyperparameters control the behavior of the training process such as learning rate, etc.

Let’s say we are training a robot to play catch,

  • The Parameters are learned as the robot trains and repeat the task of catching the ball. E.g. it might learn that moving its arm a certain way makes it easier to catch the ball, and over time it adjusts the movement of its arm to get better at the task.
  • The buttons and switches that control the robot’s movements and learning process are like Hyperparameters.

Hyperparameter Tuning Methods

GridSearchCV
I will try to explain “GridSearchCV” in two steps i.e. Grid Search and then CV.

A) Grid Search:
Let’s say we have 3 hyperparameters containing 5 values each such as:
max_depth = 2,3,4,5,6
min_sample_leaf = 5,10,15,20,25
min_sample_split = 10,15,20,25,30
Here, the possible number of combinations = 5*5*5 = 125

Grid Search means checking with each and every combination possible and finding the best combination for the model. It is the method to find the best set of hyperparameters for an ML model.

Let’s understand it in a simple way using an example:
Think of it like trying on different sets of clothes to see what looks best. Imagine you have 3 types of clothes — shirts, pants, and shoes. To find the best outfit you could try on every possible combination of clothes, one by one. This would be like doing a grid search, where you check every possible combination of hyperparameters.

Here, you would try on 5shirts X5pants X 5 shoes = 125 combinations of clothes. The combination that gives you the best look is the one you would choose to wear. Similarly, the combination of hyperparameters that gives you the best model performance is the one you would choose to use.

B) Cross-Validation (CV):
CV in “GridSearchCV” stands for Cross-Validation. It is a technique used to evaluate the performance of an ML model by dividing the data into training and validation sets and testing the model.

It divides the data into multiple fields, trains the model multiple times using different combinations of hyperparameters on different folds, and then calculates an average performance metric across the folds. This helps to better estimate the performance of the model and avoid overfitting.

Example: We have a total of 10 data points and the CV value = 5, so 10/5 = 2 data points will be the test dataset.

So, GridSearchCV checks all the combinations of hyperparamters on both the training and validation sets.

Example: Let’s say you want to buy some clothes, and you have 3 items in mind: shirts, pants, and shoes. Each item has different sizes, colors and styles to choose from. GridSearchCV is like trying out different combinations of these items to see which one looks best to you.

  • The First Step in GridSearchCV is to create a grid of all the combinations of sizes, colors, and styles you want to try. This grid is like a table thet lists all the possible combinations of shirts, pants, and shoes you could wear.
  • Next, you divide your friends into different groups and each group acts as a validation set to evaluate the combination of clothes you are trying. You try on different combinations of clothes for each group, and ask your friends to rate the combination based on how good it looks on you.
  • Finally, you take the average of the ratings from all the groups, and the combination of clothes with the highest average rating is the one you choose to buy.

By trying on different combinations of clothes with different groups of friends, you get a better idea of how good each combination looks, and can choose the one that looks the best on average. This is similar to how “GridSearchCV” uses cross-validation to evaluate different combinations of hyperparamters and choose the one that performs the best on average.

RandomSearchCV
In RandomSearchCV, randomly we will take a combination and based on that we will create our model. Here’s how it works:

  • In RandomSearchCV, random combinations of hyperparamters values are used to train and evaluate the model. The model’s performance is then comapred with each combination of hyperparamters and the best one is selected.
  • It is computationally less expensive as comapred to GridSearchCV, but also less efficient in finding the optimal set of hyperparamters.
  • The number of random combinations of hyperparamters is controlled by a user-defined parameter, typically referred to as iterations.
  • In each iteration, a random combination of hyperparameter is selected. The model is trained with these hyperparamters and performace is evaluated.
  • The process is repeated for the specified iterations and the best combination is selected based on the model’s performace.

Conclusion

Hyperparamter Tuning is important process while building an ML model. GridSearchCV and RandomSearchCV are two popular methods for the same. Both methods have their advantages and disadvantages, and the choice between both depends on the specific given task. Itis important to carefully consider the hyperparameters and their impact on the performance of the model.

Thanks for reading this article! Leave a comment below if you have any questions. You can follow me on Linkedin and GitHub.

--

--