The effectiveness of clustering in IIoT

Gustavo Chinchayan Bernal
6 min readApr 10, 2023

How this machine learning model has become a sustainable and reliable solution for edge devices in an industrial network

An Introduction

Clustering (cluster analysis - CA) and classification are two important tasks that occur in our daily lives. As human beings, we normally have a tendency to cluster and classify objects or ideas hundreds of times a day. With the emergence of data science and AI, clustering has allowed us to view data sets that are not easily detectable by the human eye. Thus, this type of task is very important for exploratory data analysis. CA is widely used in various fields and industries such as Marketing (customer segmentation), Biology/Genetics(group genes/proteins by disease subtypes), Image/Video analysis (grouping similar images/videos together based on the visual features, Natural Language Processing, and Anomaly detection (detect anomalous behavior in systems such as cybersecurity, fraud detection, and industrial control systems).

3 feature visual representation of a K-means Algorithm. Source: Marubon-DS

Unsupervised Learning

In the data science context, clustering is an unsupervised machine learning technique, this means that it does not require predefined labeled inputs or outcomes to learn from. Instead, the goal of clustering is to identify groups or clusters in the data based on distance metrics or similarities. Essentially, the clustering algorithm is grouping data points together without any prior knowledge or guidance to discover hidden patterns or unusual data groupings without the need for human interference.

Unsupervised machine Learning model clustering groups

Components

The key components of an effective clustering technique require the following:

Visual representation of the PCA algorithm, in this instance there is a positive relationship between components 1 and 2
X: Array or Vector X , Y: Array or vector Y, x: values of the horizontal axis in the coordinate plane, y: values of vertical axis in the coordinate plane, n: number of observations

Industrial Internet of Things (IIoT)

The Constraints

Within the area of Industry 4.0, many industrial companies face various technical constraints that can affect their operations and revenue. Issues such as network connectivity (specifically in areas where there may be limited or unreliable networks), bandwidth (due to the very large amount of data generated by IoT devices), security (being vulnerable to cyberattacks that can bring potential threats in unauthorized access of private data), energy (ensuring that edge devices have an energy-efficient design in order to minimize its energy consumption as well as prolong the lifespan of these devices), and data management (controlling information adequately in order to make effective analysis that contributes to data-driven decision making).

IT security Photo by Pixabay from Pexels

The Solutions

Clustering can address all the constraints mentioned above to a greater extent as well as become an adequate form of technology that can exist on hundreds to thousands of edge devices. Supervised machine learning (such as SVM or GradientBoost) and deep learning models (such as CNN or RNN) can promise far superior performances when comparing them to clustering models however this can come at a greater cost with marginal rewards to the environment, end-user, and product owner of such technology. As mentioned in the constraint section:

  • Connectivity: Clustering can enable local data processing and analysis on edge devices which reduces the amount of data transmitted over the network and also reduce the reliance on a central server to do the data processing (similarly to a federated learning approach)
  • Bandwidth: In order to reduce bandwidth requirements in IIoT systems, clustering can compress data and transmit specific clusters of interest. This in turn makes the transmission of data more efficient and reduces any risks of network latency. Local data caching can exist in clustering methodologies by reducing the need for continuous data transmission in order to improve network efficiency and reduce energy consumption (Zhao, et al., 2016)
  • Data Management: By allowing clustering to occur locally, edge devices in the network can enable near-real-time data analysis in order to make data-driven decisions
  • Energy: Clustering methods have been known to be more energy efficient when it comes to data transmission and processing (Loganathan & Arumugan, 2021). Edge devices can also reduce the constant need for data transmission which contributes to their energy efficiency attribute. In contrast, deep learning models with complex architectures (number of parameters and training processes) typically require more computation power in order to run. On the other hand, clustering does not require training and most importantly meets the resource-constrained demands of many industrial ecosystems
  • Security: Clustering can improve the privacy of data in IIoT systems by allowing local data processing. New research has shown that utilizing devices with hybrid clustering algorithms can facilitate the broader deployment of trustworthy and smart nodes at the network edge without the need for central servers (Lapegna, et al., 2023). Clustering locally can allow avoiding the transmission of sensitive data over the network
  • Other benefits: Clustering can also be deployed as a machine learning model to perform anomaly detection and predictive analytics. The purpose is to predict the cluster assignment for new data points based on the patterns learned from the training data. In the context of anomaly detection, clustering can be used to group edge devices with similar behavior based on features such as CPU, memory usage, or network traffic. Once clusters are established, any devices that do not belong to the established clusters can be labeled as an anomaly, thus indicating that the device is malfunctioning or in the process of a cybersecurity breach

In the code below, I provide an example of a known anomaly detection clustering methodology called OPTICS with the usage of blobs in Python

from sklearn.cluster import OPTICS
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt


random.seed(10)
x, _ = make_blobs(n_samples=350, centers=1, cluster_std=.4, center_box=(20, 5))
model = OPTICS().fit(x)

#visualize the results in a plot by highlighting anomalies in red
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0],values[:,1], color='r')
plt.legend(("normal", "anomal"), loc="best", fancybox=True, shadow=True)
plt.grid(True)
plt.show()
Anomaly detection with clustering algorithm OPTICS more info here

References

  1. Onu Peter, Anup Pradhan, & Charles Mbohwa (2023). Industrial internet of things (IIoT): opportunities, challenges, and requirements in manufacturing businesses in emerging economies.
  2. Introduction to K-means: Algorithm and visualization with julia from scratch. Introduction to K-means: Algorithm and Visualization with Julia from scratch. (n.d.). Retrieved April 9, 2023, from http://marubon-ds.blogspot.com/2018/04/introduction-to-k-means-algorithm-and.html
  3. Lapegna M, Mele V, Romano D. Clustering Algorithms for Enhanced Trustworthiness on High-Performance Edge-Computing Devices. Electronics. 2023; 12(7):1689. https://doi.org/10.3390/electronics12071689
  4. Leilei, S., Guoqing, C., Hui, X., & Chonghui, G. (2017). Cluster Analysis in Data-Driven Management and Decisions. Journal of Management Science and Engineering, 2, 227.
  5. Loganathan, S., Arumugam, J. Energy Efficient Clustering Algorithm Based on Particle Swarm Optimization Technique for Wireless Sensor Networks. Wireless Pers Commun 119, 815–843 (2021). https://doi.org/10.1007/s11277-021-08239-z
  6. Lorenza Prospero, Roberto Costa, Leonardo Badia, Resource Sharing in the Internet of Things and Selfish Behaviors of the Agents, IEEE Transactions on Circuits and Systems II: Express Briefs, 10.1109/TCSII.2021.3121560, 68, 12, (3488–3492), (2021).
  7. Z. Zhao, M. Peng, Z. Ding, W. Wang and H. V. Poor, “Cluster Content Caching: An Energy-Efficient Approach to Improve Quality of Service in Cloud Radio Access Networks,” in IEEE Journal on Selected Areas in Communications, vol. 34, no. 5, pp. 1207–1221, May 2016, doi: 10.1109/JSAC.2016.2545384.

BECOME a WRITER at MLearning.ai

--

--