Support Vector Machine: A Comprehensive Guide — Part2

Sachin Dev
5 min readMay 22, 2023

In my last article, we discussed SVMs, the geometric intuition behind SVMs, and also Soft and Hard margins. Today we will continue discussing SVMs and will try to understand the mathematics behind SVMs, cost function, Support Vector Regressor (SVR), and Support Vector Kernels.

Source: Alteryx

SVM Mathematical Intuition

In the image below, we have a best-fit line or a plane separating positive and negative lines. Also, there is a vector w perpendicular to the plane. Vector w and the positive points are on the same side of the plane so the distance between them is positive. On the other hand, negative points and vector w are on opposite sides so the distance between them is negative.

SVM

In the next image, we have a best-fit line or plane but along with that we also have two marginal planes passing through the nearest point of each category and the equation of the plane is also shown.

SVM

The distance of a positive point from the marginal plane is positive (+1) and the distance of a negative point from the marginal plane is negative(-1).In SVM we want to increase the distance between these two marginal planes. In order to increase the distance(d) between two marginal planes, we can subtract

Increasing distance between marginal planes

To get the unit vector of w, we can divide both sides by ||w||

Distance b/w marginal planes

Cost Function

The aim of the cost function is to maximize the distance between marginal planes i.e. 2/||w|| by changing the value of w and b. The cost function in SVM for classification is designed to strike a balance between maximizing the margin and minimizing misclassifications.

Along with the cost function, we will add a constraint such that

Constraint

Here Yi is true output and this condition is always true for all correct classified points. Or we can say

Constraint

Now let’s try to understand the above equation:

  • When y = 1 then distance (Wx + b) will also be positive (let’s say +1). So y*dist →1 (+ve value) * 1 (+ve value) = +ve value.
  • When y = -1 then the distance will also be negative (let’s say -1). So y*dist→ -1(-ve value) * -1 (-ve value) = +ve value.

So we can say that we can try to maximize the distance between marginal planes by changing w,b or we can try to reduce the loss function by changing w,b.

Loss Function

The cost function incorporates a hinge loss function, which penalizes misclassifications and points that fall within the margin. It ensures that only misclassified or margin-violating points contribute to the loss. Additionally, a regularization term, controlled by the parameter C, is included to balance the trade-off between achieving a wider margin and allowing for some misclassifications. By minimizing the sum of hinge loss and the regularization term, SVM finds the optimal hyperplane that maximizes the margin while minimizing misclassifications.

Cost Function

Support Vector Regressor(SVR)

In addition to classification, SVMs can be extended to regression tasks using the Support Vector Regressor (SVR). The SVR aims to find a hyperplane that best fits the data while limiting the margin error.

Marginal Error: The distance between the best-fit line and the marginal plane or we can have this much(€) error while constructing the marginal plane.

SVR: Marginal Error
Source: Tom Sharp

Cost Function

The cost function for SVR involves the use of a loss function that captures the deviation between the predicted and actual target values, subject to an ε-insensitive tube. Points falling within this tube are not penalized, while those outside the tube incur a penalty proportional to their distance from the tube boundary. The goal of SVR is to minimize the sum of the loss function and the regularization term, similar to SVM classification.

SVR Cost Function

SVM Kernels

With the help f SVC, we create a best-fit line or plane. Along with that, we create marginal planes passing through support vectors. With this kind of best-fit line and marginal planes, we try to solve for linearly separable data. This is called linear SVC or kernel SVC.

What will happen if we have non-linearly separable data?
Now, as the data is non-linearly separable, we won’t be able to create a best-fit line and marginal planes. Even though if it will create a best-fit line and marginal planes the accuracy will be really bad.

For this kind of problem, we can use SVM Kernels. The main aim of the SVM Kernel is to apply some transformation techniques to the dataset itself. This transformation is mainly about increasing the dimension of data.

SVM Kernel

Example: We have 1-D data as shown in the figure.

1-D Data

In order to separate these two categories we need 2 best-fit lines. But the model can create only 1 best-fit line. So we can use SVM kernels here to transform the data into higher dimensions.

Transformed Data into 2-D Data

Conclusion

Support Vector Machines (SVMs) offer a powerful framework for classification and regression tasks. By maximizing the margin and considering misclassification penalties, SVMs create robust decision boundaries that generalize well to unseen data. By leveraging the power of SVMs, we can enhance our ability to tackle complex machine-learning challenges across various domains.

Thanks for reading this article! Leave a comment below if you have any questions. You can follow me on Linkedin and GitHub.

BECOME a WRITER at MLearning.ai. Mind-to-Art models are here

--

--