AI, Data Lakes and ML - Data Science Current

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Flipboard

NOVEMBER 22, 2024

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product.

Data Governance

Data Governance ML ML Data Lakes

What Is a Lakebase?

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Database

Database Data Lakes ETL Analytics

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

AWS Machine Learning Blog

NOVEMBER 14, 2024

By setting up automated policy enforcement and checks, you can achieve cost optimization across your machine learning (ML) environment. The following table provides examples of a tagging dictionary used for tagging ML resources. A reference architecture for the ML platform with various AWS services is shown in the following diagram.

ML

ML ML AWS Machine Learning

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Build a domain‐aware data preprocessing pipeline: A multi‐agent collaboration approach

Flipboard

MAY 20, 2025

Traditional data preprocessing methods, though functional, might have limitations in accuracy and consistency. This might affect metadata extraction completeness, workflow velocity, and the extent of data utilization for AI-driven insights (such as fraud detection or risk analysis).

Data Lakes

Data Lakes AWS Analytics Analytics

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries.

ML

ML ML AWS Data Lakes

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data. After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. These services write the output to a data lake.

AWS

AWS ML ML Analytics

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Flipboard

JANUARY 6, 2025

The agency wanted to use AI [artificial intelligence] and ML to automate document digitization, and it also needed help understanding each document it digitizes, says Duan. The demand for modernization is growing, and Precise can help government agencies adopt AI/ML technologies.

AWS

AWS ML ML Machine Learning

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. Data scientists and ML engineers require capable tooling and sufficient compute for their work. Data scientists and ML engineers require capable tooling and sufficient compute for their work.

ML

ML ML AWS AI

MinIO announces support for NVIDIA AI ecosystem with AIStor updates

Dataconomy

MARCH 17, 2025

MinIO, a provider of high-performance object storage for AI, announced several upcoming enhancements to its AIStor product at NVIDIA GTC. These updates are designed to deepen MinIO’s support for the NVIDIA AI ecosystem and improve the efficiency and utilization of AI infrastructure. It will increase CPU efficiency.

Data Lakes

Data Lakes AI AI ML

Architect a mature generative AI foundation on AWS

Flipboard

MAY 30, 2025

Generative AI applications seem simpleinvoke a foundation model (FM) with the right context to generate a response. Many organizations have siloed generative AI initiatives, with development managed independently by various departments and lines of businesses (LOBs). This approach facilitates centralized governance and operations.

AWS

AWS AI AI Database

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Now all you need is some guidance on generative AI and machine learning (ML) sessions to attend at this twelfth edition of re:Invent. And although generative AI has appeared in previous events, this year we’re taking it to the next level. And although our track focuses on generative AI, many other tracks have related sessions.

AWS

AWS ML ML AI

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

IBM Data Science in Practice

JANUARY 9, 2024

One groundbreaking technology that has emerged as a game-changer is asset performance management (APM) artificial intelligence (AI). However, embarking on the journey of implementing artificial intelligence (AI) in your asset performance management strategy can be both exciting and daunting.

ML

ML ML AI AI

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

Flipboard

MAY 14, 2025

According to a Gartner survey in 2024 , 58% of finance functions have adopted generative AI, marking a significant rise in adoption. Their information is split between two types of data: unstructured data (such as PDFs, HTML pages, and documents) and structured data (such as databases, data lakes, and real-time reports).

AWS

AWS AI AI Database

10 Top LLM Companies You Must Know About

Data Science Dojo

SEPTEMBER 10, 2024

LLM companies are businesses that specialize in developing and deploying Large Language Models (LLMs) and advanced machine learning (ML) models. Open AI In the rapidly evolving field of artificial intelligence, OpenAI stands out as a leading force in the LLM world. What are LLM Companies?

Machine Learning

Machine Learning Machine Learning Natural Language Processing ML

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

SQL

SQL AWS Data Lakes AI

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Flipboard

MARCH 7, 2023

This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker , and utilize a combined architecture. Validation set 11 1500 0.82

ML

ML ML AWS AI

Generative AI operating models in enterprise organizations with Amazon Bedrock

AWS Machine Learning Blog

JANUARY 29, 2025

Generative AI can revolutionize organizations by enabling the creation of innovative applications that offer enhanced customer and employee experiences. In this post, we evaluate different generative AI operating model architectures that could be adopted.

AWS

AWS AI AI Database

How to Ensure Your New Cloud Data Lake Is Secure

Dataversity

MARCH 24, 2021

Enterprises migrating on-prem data environments to the cloud in pursuit of more robust, flexible, and integrated analytics and AI/ML capabilities are fueling a surge in cloud data lake implementations. The post How to Ensure Your New Cloud Data Lake Is Secure appeared first on DATAVERSITY.

Data Lakes

Data Lakes Cloud Data ML ML

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

His mission is to enable customers achieve their business goals and create value with data and AI. His mission is to enable customers achieve their business goals and create value with data and AI. Foundation models (FMs) on Amazon Bedrock provide powerful generative models for text and language tasks.

AWS

AWS Database ML ML

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. The next step is to build ML models using features selected from one or multiple feature groups.

ML

ML ML AWS Data Warehouse

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East…

ODSC - Open Data Science

JUNE 1, 2023

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East Highlights Getting Up to Speed on Real-Time Machine Learning with Spark and SBERT Learn more about real-time machine learning by using this approach that uses Apache Spark and SBERT. Is an AI Coding Assistant Right For You?

Data Lakes

Data Lakes ML ML Citizen Data Scientist

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon. He is focused on Big Data, Data Lakes, Streaming and batch Analytics services and generative AI technologies. Arghya Banerjee is a Sr. Varun Mehta is a Sr. Solutions Architect at AWS.

SQL

SQL AWS AI AI

How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects

AWS Machine Learning Blog

JANUARY 13, 2023

In 1992, Thomson Reuters (TR) released its first AI legal research service, WIN (Westlaw Is Natural), an innovation at the time, as most search engines only supported Boolean terms and connectors. With this tremendous increase of AI services, the next milestone for TR was to streamline innovation, and facilitate collaboration.

ML

ML ML AWS Data Scientist

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

With the current housing shortage and affordability concerns, Rocket simplifies the homeownership process through an intuitive and AI-driven experience. Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks.

Data Science

Data Science AWS Hadoop Data Scientist

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 27, 2023

This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the third post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. SageMaker endpoints can be registered to the Salesforce Data Cloud to activate predictions in Salesforce.

ML

ML ML AWS SQL

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps).

AI

AI AI ML ML

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

8 Data Lake Vendors to Make Your Data Life Easier in 2023

ODSC - Open Data Science

JUNE 7, 2023

To make your data management processes easier, here’s a primer on data lakes, and our picks for a few data lake vendors worth considering. What is a data lake? First, a data lake is a centralized repository that allows users or an organization to store and analyze large volumes of data.

Data Lakes

Data Lakes Azure Data Warehouse Hadoop

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

AWS Machine Learning Blog

AUGUST 21, 2024

Amazon DataZone is a data management service that makes it quick and convenient to catalog, discover, share, and govern data stored in AWS, on-premises, and third-party sources. Enterprises can use no-code ML solutions to streamline their operations and optimize their decision-making without extensive administrative overhead.

Machine Learning

Machine Learning Machine Learning Data Governance ML

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors. Qiong (Jo) Zhang , PhD, is a Senior Partner Solutions Architect at AWS, specializing in AI/ML.

AWS

AWS AI AI Python

Learn AI Together — Towards AI Community Newsletter #18

Towards AI

MARCH 28, 2024

Author(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts! This week, I’m super excited to announce that we are finally releasing our book, ‘Building AI for Production; Enhancing LLM Abilities and Reliability with Fine-Tuning and RAG,’ where we gathered all our learnings.

AI

AI AI Data Lakes Azure

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the second post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. This completes the setup to enable data access from Salesforce Data Cloud to SageMaker Studio to build AI and machine learning (ML) models.

ML

ML ML AWS AI

Build generative AI–powered Salesforce applications with Amazon Bedrock

AWS Machine Learning Blog

JULY 29, 2024

SageMaker endpoints can be registered with Salesforce Data Cloud to activate predictions in Salesforce. Requests and responses between Salesforce and Amazon Bedrock pass through the Einstein Trust Layer , which promotes responsible AI use across Salesforce.

AWS

AWS AI AI ML

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. This type of data is often used in ML and artificial intelligence applications.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Data Engineering for IoT Applications: Unleashing the Power of the Internet of Things

Data Science Connect

JULY 28, 2023

These platforms provide data engineers with the flexibility to develop and deploy IoT applications efficiently. Data Lakes for Centralized Storage Data lakes serve as centralized repositories for storing raw and processed IoT data.

Internet of Things

Internet of Things Data Engineer Data Engineering Data Engineering

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

This post, part of the Governing the ML lifecycle at scale series ( Part 1 , Part 2 , Part 3 ), explains how to set up and govern a multi-account ML platform that addresses these challenges. An enterprise might have the following roles involved in the ML lifecycles. This ML platform provides several key benefits.

ML

ML ML Data Scientist AWS

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

Achieve your AI goals with an open data lakehouse approach

IBM Journey to AI blog

OCTOBER 4, 2023

Artificial intelligence (AI) is now at the forefront of how enterprises work with data to help reinvent operations, improve customer experiences, and maintain a competitive advantage. It’s no longer a nice-to-have, but an integral part of a successful data strategy. Why does AI need an open data lakehouse architecture?

Data Lakes

Data Lakes Data Warehouse AI AI

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Discover how you can use Amazon Redshift to build a data mesh architecture to analyze your data.

AWS

AWS Data Warehouse ETL SQL

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning Blog

JUNE 21, 2024

To accomplish this, eSentire built AI Investigator, a natural language query tool for their customers to access security platform data by using AWS generative artificial intelligence (AI) capabilities. This helps customers quickly and seamlessly explore their security data and accelerate internal investigations.

AWS

AWS AI AI Natural Language Processing

Insights in implementing production-ready solutions with generative AI

AWS Machine Learning Blog

APRIL 30, 2025

As generative AI revolutionizes industries, organizations are eager to harness its potential. Companies in EMEA have used AWS services to transform their operations and improve customer experience using generative AI, with their stories illustrating how a strong business case can lead to tangible results across various industry verticals.

AWS

AWS AI AI Machine Learning

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

What Is a Lakebase?

Webinars

Trending Sources

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

Webinars

Streaming Machine Learning Without a Data Lake

Build a domain‐aware data preprocessing pipeline: A multi‐agent collaboration approach

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Unstructured data management and governance using AWS AI/ML and analytics services

Precise Software Solutions implements ML as a service on AWS to save time and money for federal agency

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

MinIO announces support for NVIDIA AI ecosystem with AIStor updates

Architect a mature generative AI foundation on AWS

Your guide to generative AI and ML at AWS re:Invent 2023

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

10 Top LLM Companies You Must Know About

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

AI/ML-driven actionable insights and themes for Amazon third-party sellers using AWS

Generative AI operating models in enterprise organizations with Amazon Bedrock

How to Ensure Your New Cloud Data Lake Is Secure

Search enterprise data assets using LLMs backed by knowledge graphs

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East…

Generate financial industry-specific insights using generative AI and in-context fine-tuning

How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects

How Rocket Companies modernized their data science solution on AWS

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Shaping the future: OMRON’s data-driven journey with AWS

8 Data Lake Vendors to Make Your Data Life Easier in 2023

Unlock the power of data governance and no-code machine learning with Amazon SageMaker Canvas and Amazon DataZone

Improving air quality with generative AI

Learn AI Together — Towards AI Community Newsletter #18

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

Build generative AI–powered Salesforce applications with Amazon Bedrock

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Data Engineering for IoT Applications: Unleashing the Power of the Internet of Things

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Achieve your AI goals with an open data lakehouse approach

AWS re:Invent 2023 Amazon Redshift Sessions Recap

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Insights in implementing production-ready solutions with generative AI

Stay Connected