Artificial Intelligence, Data Engineer and Data Lakes

What Is a Lakebase?

databricks

JUNE 11, 2025

Separation of storage and compute : Lakebases store their data in modern data lakes (object stores) in open formats, which enables scaling compute and storage separately, leading to lower TCO and eliminating lock-in. At zero, the cost of the lakebase is just the cost of storing the data on cheap data lakes.

Database

Database Data Lakes ETL Analytics

Data Engineering for IoT Applications: Unleashing the Power of the Internet of Things

Data Science Connect

JULY 28, 2023

A recent article on Analytics Insight explores the critical aspect of data engineering for IoT applications. Understanding the intricacies of data engineering empowers data scientists to design robust IoT solutions, harness data effectively, and drive innovation in the ever-expanding landscape of connected devices.

Internet of Things

Internet of Things Data Engineer Data Engineering Data Engineering

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

Sneak peek at Microsoft Fabric price and its promising features

Dataconomy

JUNE 1, 2023

Unified data storage : Fabric’s centralized data lake, Microsoft OneLake, eliminates data silos and provides a unified storage system, simplifying data access and retrieval. OneLake is designed to store a single copy of data in a unified location, leveraging the open-source Apache Parquet format.

Power BI

Power BI Data Lakes Azure Data Silos

8 Data Lake Vendors to Make Your Data Life Easier in 2023

ODSC - Open Data Science

JUNE 7, 2023

To make your data management processes easier, here’s a primer on data lakes, and our picks for a few data lake vendors worth considering. What is a data lake? First, a data lake is a centralized repository that allows users or an organization to store and analyze large volumes of data.

Data Lakes

Data Lakes Azure Data Warehouse Hadoop

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications. The following diagram illustrates the solution architecture.

SQL

SQL Data Lakes Data Analyst AWS

Open Data Lakes, Safeguarding Images From AI, Free Data Viz Tools, and 50% Off ODSC East

ODSC - Open Data Science

FEBRUARY 15, 2024

The Future of the Single Source of Truth is an Open Data Lake Organizations that strive for high-performance data systems are increasingly turning towards the ELT (Extract, Load, Transform) model using an open data lake.

Data Lakes

Data Lakes Data Visualization Machine Learning Machine Learning

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

Flipboard

MAY 14, 2025

Their information is split between two types of data: unstructured data (such as PDFs, HTML pages, and documents) and structured data (such as databases, data lakes, and real-time reports). Different types of data typically require different tools to access them. Traditionally, businesses face a challenge.

AWS

AWS AI AI Database

Highlights from the Data Engineering Summit Now Available On Demand

ODSC - Open Data Science

FEBRUARY 14, 2023

We’ve just wrapped up our first-ever Data Engineering Summit. If you weren’t able to make it, don’t worry, you can watch the sessions on-demand and keep up-to-date on essential data engineering tools and skills. It also addresses the strategies and best practices for implementing a data mesh.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Data engineering is a hot topic in the AI industry right now. And as data’s complexity and volume grow, its importance across industries will only become more noticeable. But what exactly do data engineers do? So let’s do a quick overview of the job of data engineer, and maybe you might find a new interest.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Announcing the First Speakers for the 2024 Data Engineering Summit

ODSC - Open Data Science

FEBRUARY 15, 2024

We couldn’t be more excited to announce the first sessions for our second annual Data Engineering Summit , co-located with ODSC East this April. Join us for 2 days of talks and panels from leading experts and data engineering pioneers. Is Gen AI A Data Engineering or Software Engineering Problem?

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Flipboard

NOVEMBER 17, 2023

MongoDB vector data store MongoDB Atlas Vector Search is a new feature that allows you to store and search vector data in MongoDB. Vector data is a type of data that represents a point in a high-dimensional space. This type of data is often used in ML and artificial intelligence applications.

K-nearest Neighbors

K-nearest Neighbors AWS Clustering Database

Media Mix Modeling, ML Safety Concerns with LLMs, and Data Engineering Cloud Options

ODSC - Open Data Science

APRIL 27, 2023

5 Data Engineering and Data Science Cloud Options for 2023 AI development is incredibly resource intensive. As such, here are a few data science cloud options to help you handle some work virtually. Here are a few things to keep an eye out for. What are the benefits of this technology, and how can you apply it?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineering is a rapidly growing field, and there is a high demand for skilled data engineers. If you are a data scientist, you may be wondering if you can transition into data engineering. In this blog post, we will discuss how you can become a data engineer if you are a data scientist.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Introducing the Amazon Comprehend flywheel for MLOps

AWS Machine Learning Blog

MARCH 1, 2023

MLOps focuses on the intersection of data science and data engineering in combination with existing DevOps practices to streamline model delivery across the ML development lifecycle. MLOps requires the integration of software development, operations, data engineering, and data science.

Data Lakes

Data Lakes AWS ML ML

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

AWS Machine Learning Blog

JULY 3, 2025

To achieve the desired accuracy in KPI calculations, the data pipeline was refined to achieve consistent and precise performance, which leads to meaningful insights. At this point, it became possible for the calculator agent to forego the Pandas or Spark data processing implementation.

AWS

AWS AI AI SQL

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

He highlights innovations in data, infrastructure, and artificial intelligence and machine learning that are helping AWS customers achieve their goals faster, mine untapped potential, and create a better future. Learn more about the AWS zero-ETL future with newly launched AWS databases integrations with Amazon Redshift.

AWS

AWS Data Warehouse ETL SQL

Exploring Open-Source Innovations: 13 Companies Offering Cutting-Edge Solutions

ODSC - Open Data Science

MARCH 21, 2025

lakeFSGit-Like Data Versioning lakeFS brings version control to large-scale data lakes. By enabling Git-like operations for data, lakeFS allows data engineers to manage, experiment with, and collaborate on datasets more effectively.

Data Scientist

Data Scientist Data Visualization Data Science Data Lakes

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

He specializes in large language models, cloud infrastructure, and scalable data systems, focusing on building intelligent solutions that enhance automation and data accessibility across Amazons operations. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

5 Fast-Growing Data Management Trends in 2023

ODSC - Open Data Science

MAY 16, 2023

Data Mesh More data management systems in 2023 will also shift toward a data mesh architecture. This decentralized architecture breaks data lakes into smaller domains specific to a given team or department. Automation and artificial intelligence (AI) will see particular growth in the realm of observability.

Database

Database Data Science Data Lakes Data Observability

Evolvability — It’s Mostly About Data Contracts

ODSC - Open Data Science

APRIL 25, 2025

EvolvabilityIts Mostly About Data Contracts Editors note: Elliott Cordo is a speaker for ODSC East this May 1315! Be sure to check out his talk, Enabling Evolutionary Architecture in Data Engineering , there to learn about data contracts and plentymore.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

The field of artificial intelligence is growing rapidly and with it the demand for professionals who have tangible experience in AI and AI-powered tools. Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. billion in 2021 to $331.2

Data Scientist

Data Scientist Machine Learning Machine Learning Computer Science

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. This happens only when a new data format is detected to avoid overburdening scarce Afri-SET resources. Having a human-in-the-loop to validate each data transformation step is optional.

AWS

AWS Python AI AI

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases. Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data.

ML

ML ML Data Preparation AWS

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Despite the benefits of this architecture, Rocket faced challenges that limited its effectiveness: Accessibility limitations: The data lake was stored in HDFS and only accessible from the Hadoop environment, hindering integration with other data sources. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning Blog

JUNE 21, 2024

To accomplish this, eSentire built AI Investigator, a natural language query tool for their customers to access security platform data by using AWS generative artificial intelligence (AI) capabilities. eSentire has over 2 TB of signal data stored in their Amazon Simple Storage Service (Amazon S3) data lake.

AWS

AWS AI AI Natural Language Processing

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Big data isn’t an abstract concept anymore, as so much data comes from social media, healthcare data, and customer records, so knowing how to parse all of that is needed. This pushes into big data as well, as many companies now have significant amounts of data and large data lakes that need analyzing.

Data Science

Data Science Data Scientist Computer Science Computer Science

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Let’s demystify this using the following personas and a real-world analogy: Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store Data scientists (consumers) – They extract and utilize this data to craft their models Data engineers serve as architects sketching the initial blueprint.

AWS

AWS ML ML Machine Learning

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. Millions of counters are added daily, together with 20 million insights updated daily to spot threat patterns.

SQL

SQL Database AWS Machine Learning

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Artificial intelligence (AI) adoption is still in its early stages. The Stanford Institute for Human-Centered Artificial Intelligence’s Center for Research on Foundation Models (CRFM) recently outlined the many risks of foundation models, as well as opportunities. Trustworthiness is critical.

AI

AI AI Data Warehouse ML

Why optimize your warehouse with a data lakehouse strategy

IBM Journey to AI blog

APRIL 25, 2023

In a prior blog , we pointed out that warehouses, known for high-performance data processing for business intelligence, can quickly become expensive for new data and evolving workloads. To do so, Presto and Spark need to readily work with existing and modern data warehouse infrastructures.

Data Warehouse

Data Warehouse Data Engineering Data Engineering Data Engineering

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

ODSC - Open Data Science

MARCH 30, 2023

Chamber of Commerce, a leading business lobbying group, made headlines recently by calling for the regulation of artificial intelligence. 7 Sessions at ODSC East 2023 to Help You Perform NLP Better From conversational AI to hyper-productive NLP and more, here are 7 helpful sessions on NLP coming to ODSC East this May.

Azure

Azure ML ML Data Modeling

Scale knowledge management use cases with generative AI

IBM Journey to AI blog

JULY 27, 2023

Artificial intelligence is disrupting many different areas of business. Powering a knowledge management system with a data lakehouse Organizations need a data lakehouse to target data challenges that come with deploying an AI-powered knowledge management system. A data lakehouse is a fit-for-purpose data store.

AI

AI AI Data Scientist Data Quality

Podcast: Deciphering Data Architectures with James Serra

ODSC - Open Data Science

MAY 7, 2024

Beyond his technical achievements, James is a sought-after speaker and is a prolific voice in the data community through his blog, JamesSerra.com. James Serra discusses data lakehouses, which merge data lakes and data warehouses. It lets you store a wide variety of data in a cost-effective way, like a data lake.

Data Warehouse

Data Warehouse Data Lakes Data Science Big Data

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation. Babu Srinivasan is a Senior Partner Solutions Architect at MongoDB.

Clustering

Clustering AWS Database ML

How OLAP and AI can enable better business

IBM Journey to AI blog

DECEMBER 7, 2023

Online analytical processing (OLAP) database systems and artificial intelligence (AI) complement each other and can help enhance data analysis and decision-making when used in tandem. IBM watsonx.data is the next generation OLAP system that can help you make the most of your data.

Data Preparation

Data Preparation Database Data Analysis Data Analysis

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Alation

MAY 16, 2023

This makes it easier to compare and contrast information and provides organizations with a unified view of their data. Machine Learning Data pipelines feed all the necessary data into machine learning algorithms, thereby making this branch of Artificial Intelligence (AI) possible.

Data Pipeline

Data Pipeline Data Governance Data Lakes Data Warehouse

Watch Now: The Top West 2024 Recordings

ODSC - Open Data Science

NOVEMBER 18, 2024

Introduction to Containers for Data Science/Data Engineering Michael A Fudge | Professor of Practice, MSIS Program Director | Syracuse University’s iSchool In this hands-on session, you’ll learn how to leverage the benefits of containers for DS and data engineering workflows.

Deep Learning

Deep Learning Deep Learning Database Data Science

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

AI is quickly scaling through dozens of industries as companies, non-profits, and governments are discovering the power of artificial intelligence. Cloudera Cloudera is a cloud-based platform that provides businesses with the tools they need to manage and analyze data. So, what are you waiting for?

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is tightly integrated with a range of BMW Central IT services, including identity and access management, roles and rights management, BMW Cloud Data Hub (BMW’s data lake on AWS) and on-premises databases. He works closely with enterprise customers to design data platforms and build advanced analytics and ML use cases.

ML

ML ML AWS AI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities.

AI

AI AI Machine Learning Machine Learning

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

AWS

AWS Database ETL AI

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 20, 2023

HPCC Systems — The Kit and Kaboodle for Big Data and Data Science Bob Foreman | Software Engineering Lead | LexisNexis/HPCC Join this session to learn how ECL can help you create powerful data queries through a comprehensive and dedicated data lake platform.

AI

AI AI Data Science Machine Learning

What Is a Lakebase?

Data Engineering for IoT Applications: Unleashing the Power of the Internet of Things

Trending Sources

Shaping the future: OMRON’s data-driven journey with AWS

Sneak peek at Microsoft Fabric price and its promising features

8 Data Lake Vendors to Make Your Data Life Easier in 2023

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Open Data Lakes, Safeguarding Images From AI, Free Data Viz Tools, and 50% Off ODSC East

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

Highlights from the Data Engineering Summit Now Available On Demand

What Does a Data Engineering Job Involve in 2024?

Announcing the First Speakers for the 2024 Data Engineering Summit

Retrieval-Augmented Generation with LangChain, Amazon SageMaker JumpStart, and MongoDB Atlas semantic search

Media Mix Modeling, ML Safety Concerns with LLMs, and Data Engineering Cloud Options

How to Shift from Data Science to Data Engineering

Introducing the Amazon Comprehend flywheel for MLOps

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Exploring Open-Source Innovations: 13 Companies Offering Cutting-Edge Solutions

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

5 Fast-Growing Data Management Trends in 2023

Evolvability — It’s Mostly About Data Contracts

6 Remote AI Jobs to Look for in 2024

Improving air quality with generative AI

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

How Rocket Companies modernized their data science solution on AWS

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

40 Must-Know Data Science Skills and Frameworks for 2023

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Data science vs data analytics: Unpacking the differences

Imperva optimizes SQL generation from natural language using Amazon Bedrock

How to use foundation models and trusted governance to manage AI workflow risk

Why optimize your warehouse with a data lakehouse strategy

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

Scale knowledge management use cases with generative AI

Podcast: Deciphering Data Architectures with James Serra

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

How OLAP and AI can enable better business

Building Robust Data Pipelines: 9 Fundamentals and Best Practices to Follow

Watch Now: The Top West 2024 Recordings

Find Your AI Solutions at the ODSC West AI Expo

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Exploring the AI and data capabilities of watsonx

Tackling AI’s data challenges with IBM databases on AWS

Find Your AI Solutions at the ODSC West AI Expo

Stay Connected