AWS, Data Lakes and Document - Data Science Current

How AWS sales uses Amazon Q Business for customer engagement

AWS Machine Learning Blog

DECEMBER 11, 2024

Earlier this year, we published the first in a series of posts about how AWS is transforming our seller and customer journeys using generative AI. Field Advisor serves four primary use cases: AWS-specific knowledge search With Amazon Q Business, weve made internal data sources as well as public AWS content available in Field Advisors index.

AWS

AWS Database AI AI

Build a domain‐aware data preprocessing pipeline: A multi‐agent collaboration approach

Flipboard

MAY 20, 2025

Enterprisesespecially in the insurance industryface increasing challenges in processing vast amounts of unstructured data from diverse formats, including PDFs, spreadsheets, images, videos, and audio files. These might include claims document packages, crash event videos, chat transcripts, or policy documents.

Data Lakes

Data Lakes AWS Analytics Analytics

What Is a Lakebase?

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! Lakebases share the same architecture.

Database

Database Data Lakes ETL Analytics

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP.

AWS

AWS Data Governance Data Silos SQL

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

Search solutions in modern big data management must facilitate efficient and accurate search of enterprise data assets that can adapt to the arrival of new assets. The application needs to search through the catalog and show the metadata information related to all of the data assets that are relevant to the search context.

AWS

AWS Database ML ML

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Flipboard

JANUARY 6, 2025

Precise), an Amazon Web Services (AWS) Partner , participated in the AWS Think Big for Small Business Program (TBSB) to expand their AWS capabilities and to grow their business in the public sector. The platform helped the agency digitize and process forms, pictures, and other documents. Precise Software Solutions, Inc.

AWS

AWS ML ML Machine Learning

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

Flipboard

MAY 14, 2025

Their information is split between two types of data: unstructured data (such as PDFs, HTML pages, and documents) and structured data (such as databases, data lakes, and real-time reports). Different types of data typically require different tools to access them.

AWS

AWS AI AI Database

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

Text, images, audio, and videos are common examples of unstructured data. Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

AWS

AWS ML ML Analytics

Integrate foundation models into your code with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 6, 2024

Prerequisites Before you dive into the integration process, make sure you have the following prerequisites in place: AWS account – You’ll need an AWS account to access and use Amazon Bedrock. You can interact with Amazon Bedrock using AWS SDKs available in Python, Java, Node.js, and more.

AWS

AWS Python Machine Learning Machine Learning

Query structured data from Amazon Q Business using Amazon QuickSight integration

AWS Machine Learning Blog

DECEMBER 3, 2024

Although generative AI is fueling transformative innovations, enterprises may still experience sharply divided data silos when it comes to enterprise knowledge, in particular between unstructured content (such as PDFs, Word documents, and HTML pages), and structured data (real-time data and reports stored in databases or data lakes).

AWS

AWS Database Data Silos Data Lakes

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Lets assume that the question What date will AWS re:invent 2024 occur? The corresponding answer is also input as AWS re:Invent 2024 takes place on December 26, 2024. If the question was Whats the schedule for AWS events in December?, This setup uses the AWS SDK for Python (Boto3) to interact with AWS services.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Well-rounded technical architecture for a RAG implementation on AWS

Flipboard

FEBRUARY 19, 2025

In the age of generative artificial intelligence (AI), data isnt just kingits the entire kingdom. The success of any RAG implementation fundamentally depends on the quality, accessibility, and organization of its underlying data foundation.

AWS

AWS Cloud Computing Natural Language Processing Data Lakes

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning Blog

JANUARY 29, 2025

Intelligent document processing , translation and summarization, flexible and insightful responses for customer support agents, personalized marketing content, and image and code generation are a few use cases using generative AI that organizations are rolling out in production.

AWS

AWS AI AI Database

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

This archive, along with 765,933 varied-quality inspection photographs, some over 15 years old, presented a significant data processing challenge. Processing these images and scanned documents is not a cost- or time-efficient task for humans, and requires highly performant infrastructure that can reduce the time to value.

AWS

AWS Data Lakes ML ML

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 26, 2024

At AWS, we are transforming our seller and customer journeys by using generative artificial intelligence (AI) across the sales lifecycle. It will be able to answer questions, generate content, and facilitate bidirectional interactions, all while continuously using internal AWS and external data to deliver timely, personalized insights.

AWS

AWS AI AI K-nearest Neighbors

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Flipboard

MARCH 7, 2023

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker , and utilize a combined architecture. im', 0.08224299065420558), ('jun 23.

ML

ML ML AWS AI

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. As always, AWS welcomes feedback. Before testing, choose the gear icon.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

From concept to reality: Navigating the Journey of RAG from proof of concept to production

AWS Machine Learning Blog

FEBRUARY 12, 2025

Retriever quality For better retrieval performance, the way the data is stored in the vector store has a big impact. For example, your input document might include tables within the PDF. In such cases, using an FM to parse the data will provide better results.

AWS

AWS Machine Learning Machine Learning AI

Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel

AWS Machine Learning Blog

MARCH 1, 2023

Amazon Comprehend is a managed AI service that uses natural language processing (NLP) with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document.

Data Lakes

Data Lakes AWS ML ML

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. This post focuses on the Operational Excellence pillar of the IDP solution.

AWS

AWS ML ML Machine Learning

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

AWS Machine Learning Blog

AUGUST 2, 2024

The Product Stewardship department is responsible for managing a large collection of regulatory compliance documents. Example questions might be “What are the restrictions for CMR substances?”, “How long do I need to keep the documents related to a toluene sale?”, or “What is the reach characterization ratio and how do I calculate it?”

AWS

AWS Machine Learning Machine Learning Database

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

Smart Data Collective

AUGUST 17, 2022

You can safely use an Apache Kafka cluster for seamless data movement from the on-premise hardware solution to the data lake using various cloud services like Amazon’s S3 and others. It will enable you to quickly transform and load the data results into Amazon S3 data lakes or JDBC data stores.

Apache Kafka

Apache Kafka ETL Data Lakes AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

Solution overview Amazon Comprehend is a fully managed service that uses natural language processing (NLP) to extract insights about the content of documents. This feature also allows you to automate model retraining after new datasets are ingested and available in the flywheel´s data lake.

Data Lakes

Data Lakes AWS ML ML

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

AWS Machine Learning Blog

MAY 30, 2024

To serve their customers, Vitech maintains a repository of information that includes product documentation (user guides, standard operating procedures, runbooks), which is currently scattered across multiple internal platforms (for example, Confluence sites and SharePoint folders).

AI

AI AI AWS Database

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Third, despite the larger adoption of centralized analytics solutions like data lakes and warehouses, complexity rises with different table names and other metadata that is required to create the SQL for the desired sources. Our solution aims to address those challenges using Amazon Bedrock and AWS Analytics Services.

SQL

SQL AWS Database ML

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure. Each embedding aims to capture the semantic or contextual meaning of the data.

AWS

AWS Machine Learning Machine Learning Database

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

AWS Machine Learning Blog

JUNE 25, 2024

When the automated content processing steps are complete, you can use the output for downstream tasks, such as to invoke different components in a customer service backend application, or to insert the generated tags into metadata of each document for product recommendation. The Step Functions workflow starts.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Cloud Data Science News – Beta 6

Data Science 101

DECEMBER 16, 2019

It now also supports PDF documents. Azure Data Factory Preserves Metadata during File Copy When performing a File copy between Amazon S3, Azure Blob, and Azure Data Lake Gen 2, the metadata will be copied as well. Azure Tips and Tricks: Make your data Searchable A quick video to demonstrate Azure Search.

Cloud Data

Cloud Data Data Science Azure Natural Language Processing

Build generative AI–powered Salesforce applications with Amazon Bedrock

AWS Machine Learning Blog

JULY 29, 2024

In Part 3 , we demonstrate how business analysts and citizen data scientists can create machine learning (ML) models, without code, in Amazon SageMaker Canvas and deploy trained models for integration with Salesforce Einstein Studio to create powerful business applications. For this post, we use the Anthropic Claude 3 Sonnet model.

AWS

AWS AI AI ML

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Text analytics: Text analytics, also known as text mining, deals with unstructured text data, such as customer reviews, social media comments, or documents. It uses natural language processing (NLP) techniques to extract valuable insights from textual data. Poor data integration can lead to inaccurate insights.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The question is sent through a retrieval-augmented generation (RAG) process, which finds similar documents.

SQL

SQL Database AWS Machine Learning

Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation

Flipboard

MAY 9, 2025

Prerequisites This solution requires you to have an AWS account with the appropriate permissions. The following code is an example using the AWS SDK for Python (Boto3) that prompts the LLM for sentiment analysis: import boto3 import json # Initialize Bedrock Runtime client bedrock = boto3.client('bedrock-runtime')

AWS

AWS Natural Language Processing AI AI

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

In this post, we will explore the potential of using MongoDB’s time series data and SageMaker Canvas as a comprehensive solution. MongoDB Atlas MongoDB Atlas is a fully managed developer data platform that simplifies the deployment and scaling of MongoDB databases in the cloud. Note we have two folders.

Clustering

Clustering AWS Database ML

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

AWS Machine Learning Blog

AUGUST 2, 2023

Amazon Kendra supports a variety of document formats , such as Microsoft Word, PDF, and text from various data sources. In this post, we focus on extending the document support in Amazon Kendra to make images searchable by their displayed content. Images can often be searched using supplemented metadata such as keywords.

AWS

AWS AI AI Machine Learning

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

These teams are as follows: Advanced analytics team (data lake and data mesh) – Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL (extract, transform, and load) pipelines to curate and catalog the data, and prepare the necessary historical data for the ML use cases.

AI

AI AI ML ML

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

Data scientists Perform data analysis, model development, model evaluation, and registering the models in a model registry. Governance officer Review the models performance including documentation, accuracy, bias and access, and provide final approval for models to be deployed.

ML

ML ML Data Scientist AWS

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Look for features such as scalability (the ability to handle growing datasets), performance (speed of processing), ease of use (user-friendly interfaces), integration capabilities (compatibility with existing systems), security measures (data protection features), and pricing models (licensing costs).

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

These encoder-only architecture models are fast and effective for many enterprise NLP tasks, such as classifying customer feedback and extracting information from large documents. While they require task-specific labeled data for fine tuning, they also offer clients the best cost performance trade-off for non-generative use cases.

AI

AI AI Machine Learning Machine Learning

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

To combine the collected data, you can integrate different data producers into a data lake as a repository. A central repository for unstructured data is beneficial for tasks like analytics and data virtualization. Data Cleaning The next step is to clean the data after ingesting it into the data lake.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Popular Data Transformation Tools: Importance and Best Practices

Pickl AI

OCTOBER 10, 2024

Informatica’s AI-powered automation helps streamline data pipelines and improve operational efficiency. Common use cases include integrating data across hybrid cloud environments, managing data lakes, and enabling real-time analytics for Business Intelligence platforms.

Data Quality

Data Quality AWS Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. User support arrangements Consider the availability and quality of support from the provider or vendor, including documentation, tutorials, forums, customer service, etc.

Machine Learning

Machine Learning Machine Learning ML ML

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Semi-Structured Data: Data that has some organizational properties but doesn’t fit a rigid database structure (like emails, XML files, or JSON data used by websites). Unstructured Data: Data with no predefined format (like text documents, social media posts, images, audio files, videos).

Big Data

Big Data Big Data Data Science Machine Learning

How AWS sales uses Amazon Q Business for customer engagement

Build a domain‐aware data preprocessing pipeline: A multi‐agent collaboration approach

Webinars

Trending Sources

What Is a Lakebase?

Webinars

Shaping the future: OMRON’s data-driven journey with AWS

Search enterprise data assets using LLMs backed by knowledge graphs

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

Unstructured data management and governance using AWS AI/ML and analytics services

Integrate foundation models into your code with Amazon Bedrock

Query structured data from Amazon Q Business using Amazon QuickSight integration

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Well-rounded technical architecture for a RAG implementation on AWS

Generative AI operating models in enterprise organizations with Amazon Bedrock

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

From concept to reality: Navigating the Journey of RAG from proof of concept to production

Simplify continuous learning of Amazon Comprehend custom models using Comprehend flywheel

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

Hybrid Vs. Multi-Cloud: 5 Key Comparisons in Kafka Architectures

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Introducing the Amazon Comprehend flywheel for MLOps

Vitech uses Amazon Bedrock to revolutionize information access with AI-powered chatbot

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Build an automated insight extraction framework for customer feedback analysis with Amazon Bedrock and Amazon QuickSight

Cloud Data Science News – Beta 6

Build generative AI–powered Salesforce applications with Amazon Bedrock

Beyond data: Cloud analytics mastery for business brilliance

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Top Big Data Tools Every Data Professional Should Know

Exploring the AI and data capabilities of watsonx

How to Manage Unstructured Data in AI and Machine Learning Projects

Popular Data Transformation Tools: Importance and Best Practices

MLOps Landscape in 2023: Top Tools and Platforms

Big Data vs. Data Science: Demystifying the Buzzwords

Stay Connected