AWS Widens Data Pipelines + Creates Amazon Q Gen-AI Assistant

Adam Selipsky keynote at AWS re:Invent 2023 Photo by Noah Berger

Noah Berger

AI is hungry. In our current age of Artificial Intelligence (AI) with the new era of generative AI exhibiting a seemingly limitless appetite for large information resources, the enterprise technology space never gets tired of talking about the importance of data and how we manage it in all its various forms.

It is because data exists in so varied a set of structures and forms that we can do much with it. This is a good thing i.e. we want some data to sit in transaction systems (retail databases could be a basic example); we want some data to sit in rapid access low-latency systems because it is accessed, queried and updated frequently; we want to save money on less frequently used data and use cheaper data stores; we want some information to be highly ordered, structured and deduplicated (because it related to front-line mission-critical applications, for example); and we can also appreciate the fact that some unstructured data might be channelled towards a data lake, simply because we can’t categorize every voice recording, video, Internet of Things (IoT) sensor reading or even documents that may not be needed today, but perhaps tomorrow.

Extract, Transform & Load (ETL)

But all this variation in data topography also presents a challenge. When we need to use these information sets in concert - with new applications in AI being a case in point - we face an access challenge. This is where technology architects, database administrators and software application developers talk of their ETL requirement - an acronym denoting the need to Extract, Transform & Load (ETL) data from one place to another.

NOTE: For data science completeness, we should also mention that ETL’s sister data integration process and discipline is Extract, Load, Transform (ELT) - the point at which we take raw or unstructured data (such as from a data lake) and transform it into an ordered state for downstream use cases.

Straddling a universe of databases, data lakes, data warehouses, data marketplaces and data workloads is of course Amazon Web Services, Inc. (AWS). Keen to use its muscle to bring about new integrations capabilities across the planet’s data pipeline network, AWS has now explained how its new Amazon Aurora PostgreSQL, Amazon DynamoDB and Amazon Relational Database Service (Amazon RDS) for MySQL integrations with Amazon Redshift make it easier to connect and analyze transactional data from multiple relational and non-relational databases in Amazon Redshift. Customers can also now use Amazon OpenSearch Service to perform full-text and vector search functionality on DynamoDB data in near real-time.

MORE FROMFORBES ADVISOR

Best High-Yield Savings Accounts Of September 2023

Kevin Payne

Contributor

Best 5% Interest Savings Accounts of September 2023

Cassidy Horton

Contributor

Zero-ETL integrations

By making it easier to connect to and act on any data regardless of its location, AWS is calling these technologies ‘zero-ETL integrations’ and they are promised to help users tap into the depth of AWS's database and analytics services.

“AWS offers the industry’s broadest and deepest set of data services for storing and querying any type of data at scale,” said Dr. Swami Sivasubramanian, vice president of data and Artificial Intelligence at AWS. “In addition to having the right tool for the job, customers need to be able to integrate the data that is spread across their organizations to unlock more value for their business. That is why we are investing in a zero-ETL future, where data integration is no longer a tedious, manual effort and where customers can easily get their data where they need it.”

We know that organizations have different types of data coming from different origins at varying scales and speeds and the uses for this data are just as varied. For organizations to make the most of their data, AWS insists they need a comprehensive set of tools that account for all of these variables, along with the ability to integrate and combine data spread across multiple sources.

A working example

For example says AWS, “A company may store transactional data in a relational database that it wants to analyze in a data warehouse, but use another analytics tool to perform a vector search on data from a non-relational database. Historically, moving data has required customers to architect their own ETL pipelines, which can be challenging and costly to build, complex to manage, and prone to intermittent errors that delay access to time-sensitive insights.”

That is why AWS underlines its work in this space i..e it has invested in zero-ETL capabilities that remove the burden of manually moving data. This includes federated query capabilities in Amazon Redshift and Amazon Athena - which enable users to directly query data stored in operational databases, data warehouses and data lakes — and Amazon Connect analytics data lake - which enables users to access contact center data for analytics and machine learning. The work here also includes new zero-ETL integrations between Salesforce Data Cloud and AWS storage, data and analytics services to enable organizations to unify their data across Salesforce and AWS.

Hey, remember ETL?

The entire thread of what’s happening here comes down to a theme we see being played out across the entire enterprise IT landscape - automation. According to G2 Krishnamoorthy, vice president of analytics at AWS, if we can remove a good part (or indeed all) of the ETL workload that software development and IT operations teams previously needed to shoulder, then we are putting the ETL function into a space where it becomes a utility.

G2 Krishnamoorthy says that this will not only make the software engineering team happy, but it will also make anyone who needs to get access to data across the huge variety of sources we have depicted here happy. Could that lead to a time when software engineers sit back and joke - hey, remember ETL? Okay, it’s not a great joke, but it’s a happy one.

Enter... Amazon Q

Also coming forward from AWS right now is a new type of generative AI assitant. Known as Amazon Q, this technology has been built specifically for work and can be tailored to a user's own business requirements inside different organizations. So then (as we so often say), what is it and how does it work?

AWS positions Q as a means of offering all sorts of users with a tool to get fast, relevant answers to important work (and life, potentially) questions, generate content and take actions. How does it work? It draws its knownledge from a customer’s own information repositories, software application code and enterprise systems. It is designed to streamline tasks and speed up decision making and problem solving.

Built to fit what AWS promises is enough solidity to support an enterprise customers’ stringent requirements, Amazon Q can personalize its interactions to each individual user based on an organization’s existing identities, roles and permissions. With Intellectual Property (IP) concerns always close by in this area, AWS says that Amazon Q never uses business customers’ content to train its underlying models. It brings gen-AI powered assistance to users building on AWS, working internally and using AWS applications for business intelligence (BI), contact centers and supply chain management.

“AWS is helping customers harness generative AI with solutions at all three layers of the stack, including purpose-built infrastructure, tools and applications," said Dr. Swami Sivasubramanian, vice president of data and Artificial Intelligence. "Amazon Q builds on AWS’s history of taking complex, expensive technologies and making them accessible to customers of all sizes and technical abilities, with a data-first approach and enterprise-grade security and privacy built-in from the start. By bringing generative AI to where our customers work - whether they are building on AWS, working with internal data and systems, or using a range of data and business applications - Amazon Q is a powerful addition to the application layer of our generative AI stack that opens up new possibilities for every organization.”

AWS appears to be covering a lot of bases - but that's AWS. With so many cloud tools to choose from (some smaller companies using just a handful, but larger customers perhaps like those in the automotive business using the whole AWS toolbox) it's almost tough to work out which parts of the AWS stack work for each type of user base. Conveniently, Amazon Q could help answer that question too i.e. we know that the best way to fight AI-powered malware is with AI-powered vulnerability assessment and scanning tools, so surely the best way to fight business cloud complexity is with AI too.

Amazon Q is available to customers in preview, with Amazon Q in Connect generally available and Amazon Q in AWS Supply Chain coming soon. Users should form a line... and get in the queue for Amazon Q.

Follow me on Twitter or LinkedIn.

More From Forbes