BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Hazelcast Weaves Wider Logic Threads Through The Data Fabric

Following

Applications use data. It’s such a fundamental baseline that we don’t need to explain this core fact. But within this truism, there exists a seemingly infinite variety of shapes, sizes, types, forms and times of data delivery that today - in the realm of so-called ‘modern applications’ which rely on always-on cloud services and the combined worlds of continuous integration and continuous delivery - we do need to think about what apps need what data when, where and why.

The challenge for the modern company is to get value from data the second it is created. The goal is to get value from data streaming, while also accessing the context of stored data, to inform instant action. This is no easy task and legacy systems are widely argued to fall short of what’s needed. By getting value from all data, new and historical, companies can deliver better customer service, increase revenue and reduce risk.

What is distributed computation?

Always vocal on this subject is Hazelcast CEO Kelly Herrell. His company offers a distributed computation and storage platform, which (as a piece of terminology) means an ability to bring multiple computer servers over a network into a cluster, to share data and coordinate processing power.

Focused on low-latency querying, aggregation and stateful computation (‘stateful’ in simplistic terms meaning that there is memory to record data) against fast-moving real-time event streams and traditional data sources, Herrell and Hazelcast principal architect Randy May advocate the use of a data fabric approach as a key means of dealing with the complexity associated with serving data to applications at the right time, in the right place. Data that is always hot and fresh from the kitchen, but also served according to the right recipe on the right plate, if you will.

As a reminder (although it’s not a new concept) a data fabric is a sort of textured approach (fabric, get it?) to combining various disparate data sources, data pipelines, databases, data streams and related cloud data services into one woven unified entity that can also convey in data lineage in terms of information provenance and relevance for its onward use in applications.

If (for example) a patient is in hospital and has been through blood work, orthopaedic treatment and trauma counselling, if we need to combine that person’s dental records as well for some reason, their insurance status and their address and contact details of family members - then a data fabric would be an ideal use case. In business, it might be the coming together of ERP, CRM and other systems etc.

What does a data fabric do?

“The basic job of a data fabric is to provide low-latency access to a standardized, up-to-date view of an organization’s data. The pattern separates the problem of sourcing and cleaning the data from the problem of using it in an application,” explained Herrell. “For frequently used data, such as customer profiles, it makes more sense to centralize data collection and cleaning, freeing application teams to focus on building applications. However, today we want to focus on the performance of the data fabric rather than the development agility they can enable.”

What does performance in this context mean? It generally means asking how fast can an application use the data it needs - some of which will be comparatively complex concurrent access across the data fabric into more than one data source. So how does all this work in practice?

One strategy is to move the usage of the data into the fabric,” explained Hazelcast’s May. “Whether this is a viable strategy depends on what the application is doing with the data. If it is simply retrieving a data item for display on screen, then there is no choice and the chain of events will be as described above. However, there are other scenarios where moving application logic into the fabric makes a lot of sense. Consider a use case like checking a user’s entitlements. This is some of the most frequently accessed data in many applications. By moving the logic that determines ‘can user X do Y’ into the data fabric, we can cut out all serialization [i.e. converting data into a format that can be stored or transmitted and reconstructed later], yielding a significant reduction in CPU utilization and we can dramatically reduce network traffic.”

Threading architecture, made simple(r)

The threading architecture used in the data fabric can also impact performance. Traditionally, in a data fabric, different tasks are handled by different data ‘thread pools’ and it is very common for the allocation of threads to pools not to match the distribution of tasks.

“For example, a node could bottleneck, even with ample available CPU, because there are not enough threads performing Input/Output to keep the processing threads busy. Moreover, with this sort of architecture, it is also typical to configure more threads than there are available cores, forcing the OS to context switch frequently. A ‘thread per core’ approach is a more modern architecture that is seeing increasing adoption. In this model, the application assumes control of scheduling tasks to threads. Threads are not dedicated to pools with a fixed function. While difficult to implement, this architecture results in more efficient allocation of tasks to cores and less context switching,” detailed May.

Network factors

Finally (for now), Herrell and May point to the network as another factor that can have a strong influence on data fabric performance. The network usage pattern in a data fabric tends to be ‘chatty’, involving a lot of relatively small messages.

“The network stack on the nodes of the data fabric (and its clients) may need to be tuned in order to ensure efficient use of the available bandwidth. Understanding the size of the objects being sent around the network can be very important and generally, socket buffers should be intentionally sized to accommodate most objects that will be sent over the network. Round-trip time and network bandwidth should also be considered,” explain the Hazelcast tech leaders.

Does the average business user worry about whether their applications are being served by the ‘warmth’ of a data fabric service beneath them? Of course they don’t, this is implicit base-level substrate technology that works as part of an enterprise IT infrastructure layer. That being said, the core functionalities on offer here are what make our lives easier in a world with hyper-connected systems and cloud services that are increasingly woven (there’s that word again) around the planet, so most of us would notice if the data fabric rug was pulled from beneath our feet.

Follow me on Twitter or LinkedIn