Support Vector Machine: A Comprehensive Guide — Part2
In my last article, we discussed SVMs, the geometric intuition behind SVMs, and also Soft and Hard margins. Today we will continue discussing SVMs and will try to understand the mathematics behind SVMs, cost function, Support Vector Regressor (SVR), and Support Vector Kernels.
SVM Mathematical Intuition
In the image below, we have a best-fit line or a plane separating positive and negative lines. Also, there is a vector w perpendicular to the plane. Vector w and the positive points are on the same side of the plane so the distance between them is positive. On the other hand, negative points and vector w are on opposite sides so the distance between them is negative.
In the next image, we have a best-fit line or plane but along with that we also have two marginal planes passing through the nearest point of each category and the equation of the plane is also shown.
The distance of a positive point from the marginal plane is positive (+1) and the distance of a negative point from the marginal plane is negative(-1).In SVM we want to increase the distance between these two marginal planes. In order to increase the distance(d) between two marginal planes, we can subtract
To get the unit vector of w, we can divide both sides by ||w||
Cost Function
The aim of the cost function is to maximize the distance between marginal planes i.e. 2/||w|| by changing the value of w and b. The cost function in SVM for classification is designed to strike a balance between maximizing the margin and minimizing misclassifications.
Along with the cost function, we will add a constraint such that
Here Yi is true output and this condition is always true for all correct classified points. Or we can say
Now let’s try to understand the above equation:
- When y = 1 then distance (Wx + b) will also be positive (let’s say +1). So y*dist →1 (+ve value) * 1 (+ve value) = +ve value.
- When y = -1 then the distance will also be negative (let’s say -1). So y*dist→ -1(-ve value) * -1 (-ve value) = +ve value.
So we can say that we can try to maximize the distance between marginal planes by changing w,b or we can try to reduce the loss function by changing w,b.
The cost function incorporates a hinge loss function, which penalizes misclassifications and points that fall within the margin. It ensures that only misclassified or margin-violating points contribute to the loss. Additionally, a regularization term, controlled by the parameter C, is included to balance the trade-off between achieving a wider margin and allowing for some misclassifications. By minimizing the sum of hinge loss and the regularization term, SVM finds the optimal hyperplane that maximizes the margin while minimizing misclassifications.
Support Vector Regressor(SVR)
In addition to classification, SVMs can be extended to regression tasks using the Support Vector Regressor (SVR). The SVR aims to find a hyperplane that best fits the data while limiting the margin error.
Marginal Error: The distance between the best-fit line and the marginal plane or we can have this much(€) error while constructing the marginal plane.
Cost Function
The cost function for SVR involves the use of a loss function that captures the deviation between the predicted and actual target values, subject to an ε-insensitive tube. Points falling within this tube are not penalized, while those outside the tube incur a penalty proportional to their distance from the tube boundary. The goal of SVR is to minimize the sum of the loss function and the regularization term, similar to SVM classification.
SVM Kernels
With the help f SVC, we create a best-fit line or plane. Along with that, we create marginal planes passing through support vectors. With this kind of best-fit line and marginal planes, we try to solve for linearly separable data. This is called linear SVC or kernel SVC.
What will happen if we have non-linearly separable data?
Now, as the data is non-linearly separable, we won’t be able to create a best-fit line and marginal planes. Even though if it will create a best-fit line and marginal planes the accuracy will be really bad.
For this kind of problem, we can use SVM Kernels. The main aim of the SVM Kernel is to apply some transformation techniques to the dataset itself. This transformation is mainly about increasing the dimension of data.
Example: We have 1-D data as shown in the figure.
In order to separate these two categories we need 2 best-fit lines. But the model can create only 1 best-fit line. So we can use SVM kernels here to transform the data into higher dimensions.
Conclusion
Support Vector Machines (SVMs) offer a powerful framework for classification and regression tasks. By maximizing the margin and considering misclassification penalties, SVMs create robust decision boundaries that generalize well to unseen data. By leveraging the power of SVMs, we can enhance our ability to tackle complex machine-learning challenges across various domains.
Thanks for reading this article! Leave a comment below if you have any questions. You can follow me on Linkedin and GitHub.