BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

DataStax Plumbs AI Into Smarter Data Pipelines

Following

AI is versatile. We can use Artificial Intelligence (AI) to perform surface-level tasks that users can see, touch and experience through Natural Language Understanding (NLU) based queries, some of which may be typed and some of which may be spoken into speech recognition systems. With the rise of generative AI and ChatGPT-based interfaces, we now can ‘ask’ for advice, information and application functionalities in ways that we may have thought impossible at the start of this decade.

We can also use AI - in mostly the same way - to perform lower-level software and data system functions that users will be mostly oblivious to, aside from the fact that these mechanics will make the upper-tier apps and services work the way they are supposed to.

These lower-level uses of AI for data scientists and software application development engineers are also many, varied and versatile. The rise of low-code platforms in particular is paving the way for chat-query typed requests to help programmers find the tools, libraries, code snippets and connection points they need to work faster and closer to defined best-practice techniques. These same advantages also apply to the world of databases.

Aiming to put generative AI request power into the data layer with a new approach is Santa-Clara headquartered DataStax. Known for its multi-cloud database-as–a-service (DBaaS) technology based on open source Apache Cassandra, DataStax has now developed a GPT-based schema translator in its Astra Streaming cloud service.

What is a database schema?

A database schema describes and defines exactly how data inside any given relational database is ordered, structured and organized - it encompasses and denotes the ‘universe’ that a database creates in terms of the values, fields, segments and types of data presented, as well as the relationships that exist between each piece of data. Because different databases have different schema and different data structures, moving data between systems can be an incongruent task - rather like trying to fit a British or European electrical plug into an American wall outlet, or vice versa. Like any adapter, you can build and use one, but it requires a management overhead to keep it up to date – the metaphorical equivalent of remembering to keep that adapter in your bag.

DataStax’s technology uses generative AI to accurately transfer data between systems with different data structures within an enterprise. Said to be something of an industry first, the Astra Streaming GPT Schema Translator automatically generates so-called ‘schema mappings’, a traditionally difficult and time-consuming process when building and maintaining event streaming pipelines.

According to Chris Latimer, general manager for streaming technologies at DataStax, schema mapping enables data integration and interoperability between multiple systems and data sources – a fundamental element of any streaming data pipeline. “Data mappings must be done manually – a routine issue for data engineers – via a process that is complicated, tedious and error-prone. With the new DataStax GPT Schema Translator, the hard work of mapping is shifted from the developer to the platform, providing time and space for developers to focus on the more impactful components of their projects,” notes Latimer and team, in a technical statement.

The company further describes DataStax Astra Streaming as a fully-managed messaging and event streaming service built on Apache Pulsar, an open source data streaming technology. It enables companies to stream real-time data at scale, which is useful for delivering applications with massive data throughput rates that need low latency and elastic scalability from a cloud service.

“Organizations are still identifying the role that new AI technologies, like GPT, will play in their day-to-day business functions and we’re proud to be the first to integrate this new technology into our product to significantly reduce development time and support costs ,” said Latimer.

Known (and unknown) pain points

There’s a key trend here and it’s pain points.

This is a practical use case for chat-style generative AI designed to address a known pain point that data engineers (such as database administrators, system administrators and other DevOps-related operations staff) experience. Translating data structures from one location to another is time consuming, error-prone and laborsome i.e. all the things that humans hate and that machines are good at - so a perfect task for AI.

When we evolve our use of AI even further to start looking for the unknown pain points in our software engineering process and our workplace workflow processes as well, then (very arguably, surely) we will really be tapping in the creative ‘generate’ aspect of generative AI that continues to transform the way we do things.

Follow me on Twitter or LinkedIn