Data Lakes, Data Science and Events

What Is a Lakebase?

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Database

Database Data Lakes ETL Analytics

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science AWS Hadoop Data Scientist

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. Overview of One Lake Fabric features a lake-centric architecture, with a central repository known as OneLake. Now, we can save the data as delta tables to use later for sales analytics.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Analytics Analytics Data Scientist

Sneak peek at Microsoft Fabric price and its promising features

Dataconomy

JUNE 1, 2023

Unified data storage : Fabric’s centralized data lake, Microsoft OneLake, eliminates data silos and provides a unified storage system, simplifying data access and retrieval. OneLake is designed to store a single copy of data in a unified location, leveraging the open-source Apache Parquet format.

Power BI

Power BI Data Lakes Azure Data Silos

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. One way to address this is to implement a data lake: a large and complex database of diverse datasets all stored in their original format.

Data Lakes

Data Lakes Clustering Big Data Big Data

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East…

ODSC - Open Data Science

JUNE 1, 2023

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East Highlights Getting Up to Speed on Real-Time Machine Learning with Spark and SBERT Learn more about real-time machine learning by using this approach that uses Apache Spark and SBERT. Well, these libraries will give you a solid start.

Data Lakes

Data Lakes ML ML Citizen Data Scientist

Simplifying Time Series Analysis for Data Scientists

ODSC - Open Data Science

SEPTEMBER 12, 2023

Most data scientists are familiar with the concept of time series data and work with it often. The time series database (TSDB) , however, is still an underutilized tool in the data science community. Typically, time series analysis is performed either on CSV files or data lakes.

Data Scientist

Data Scientist Database Data Lakes Data Science

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Data and governance foundations – This function uses a data mesh architecture for setting up and operating the data lake, central feature store, and data governance foundations to enable fine-grained data access. This framework considers multiple personas and services to govern the ML lifecycle at scale.

ML

ML ML AWS Data Lakes

Business analytics

Dataconomy

MAY 26, 2025

AI integration: Employing artificial intelligence for automation in response to unexpected events. Diagnostic analytics Diagnostic analytics focuses on understanding the causes behind past events. It analyzes data to uncover reasons for occurrences, closely related to descriptive analytics for a comprehensive view.

Analytics

Analytics Analytics Data Analysis Data Analysis

Open Data Lakes, Safeguarding Images From AI, Free Data Viz Tools, and 50% Off ODSC East

ODSC - Open Data Science

FEBRUARY 15, 2024

The Future of the Single Source of Truth is an Open Data Lake Organizations that strive for high-performance data systems are increasingly turning towards the ELT (Extract, Load, Transform) model using an open data lake. Instead, use Prefect where interactive workflows are now natively supported. See them here!

Data Lakes

Data Lakes Data Visualization Machine Learning Machine Learning

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable. These professionals will work with their colleagues to ensure that data is accessible, with proper access. So let’s go through each step one by one, and help you build a roadmap toward becoming a data engineer.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Recent events including Tropical Cyclone Gabrielle have highlighted the susceptibility of the grid to extreme weather and emphasized the need for climate adaptation with resilient infrastructure. The model is then trained using a fully managed infrastructure, validated, and published to the Amazon SageMaker Model Registry.

AWS

AWS Data Lakes ML ML

Podcast: Deciphering Data Architectures with James Serra

ODSC - Open Data Science

MAY 7, 2024

Learn about cutting-edge developments in AI and data science from the experts who know them best on ODSC’s Ai X Podcast. Beyond his technical achievements, James is a sought-after speaker and is a prolific voice in the data community through his blog, JamesSerra.com. Interested in attending an ODSC event?

Data Warehouse

Data Warehouse Data Lakes Data Science Big Data

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. They use their knowledge of data warehousing, data lakes, and big data technologies to build and maintain data pipelines. Interested in attending an ODSC event?

Data Scientist

Data Scientist Machine Learning Machine Learning Computer Science

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Diagnostic analytics: Diagnostic analytics goes a step further by analyzing historical data to determine why certain events occurred. By understanding the “why” behind past events, organizations can make informed decisions to prevent or replicate them. Ensure that data is clean, consistent, and up-to-date.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential.

AWS

AWS ML ML Machine Learning

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Imperva Cloud WAF protects hundreds of thousands of websites against cyber threats and blocks billions of security events every day. Counters and insights based on security events are calculated daily and used by users from multiple departments. The data is stored in a data lake and retrieved by SQL using Amazon Athena.

SQL

SQL Database AWS Machine Learning

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

The week was filled with engaging sessions on top topics in data science, innovation in AI, and smiling faces that we haven’t seen in a while. Expo Hall ODSC events are more than just data science training and networking events. You can read the recap here and watch the full keynote here. What’s next?

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Insights from the Gartner Data & Analytics Summit in London: Embracing Data Leadership and Strategy

Precisely

JUNE 24, 2024

They are working through organizational design challenges while also establishing foundational data management capabilities like metadata management and data governance that will allow them to offer trusted data to the business in a timely and efficient manner for analytics and AI.”

Analytics

Analytics Analytics Data Governance Data Lakes

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

This is a pretty important job as once the data has been integrated, it can be used for a variety of purposes, such as: Reporting and analytics Business intelligence Machine learning Data mining All of this provides stakeholders and even their own teams with the data they need when they need it.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Evolvability — It’s Mostly About Data Contracts

ODSC - Open Data Science

APRIL 25, 2025

The two most prominent implementations are dbt Model Contracts , and the Open Data Contract Standard (ODCS, not to be confused withODSC. Sample dbt ModelContract Essentially, both solutions are JSON schema (widely used in APIs and event models), and thankfully, expressed in YAML for the ease of human typing.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Announcing the First Speakers for the 2024 Data Engineering Summit

ODSC - Open Data Science

FEBRUARY 15, 2024

Data Pipeline Architecture — Stop Building Monoliths Elliott Cordo | Founder, Architect, Builder | Datafutures Although common, data monoliths present several challenges, especially for larger teams and organizations that allow for federated data product development. Interested in attending an ODSC event?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Introducing the Topic Tracks for ODSC East 2024?—?Highlighting Gen AI, LLMs, and Responsible AI

ODSC - Open Data Science

MARCH 11, 2024

Introducing the Topic Tracks for ODSC East 2024 — Highlighting Gen AI, LLMs, and Responsible AI ODSC East 2024 , coming up this April 23rd to 25th, is fast approaching and this year we will have even more tracks comprising hands-on training sessions, expert-led workshops, and talks from data science innovators and practitioners.

Data Science

Data Science Deep Learning Deep Learning Machine Learning

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

AWS Machine Learning Blog

FEBRUARY 7, 2025

Data science teams often face challenges when transitioning models from the development environment to production. Usually, there is one lead data scientist for a data science group in a business unit, such as marketing. ML Dev Account This is where data scientists perform their work.

ML

ML ML Data Scientist AWS

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

With built-in components and integration with Google Cloud services, Vertex AI simplifies the end-to-end machine learning process, making it easier for data science teams to build and deploy models at scale. Metaflow Metaflow helps data scientists and machine learning engineers build, manage, and deploy data science projects.

Machine Learning

Machine Learning Machine Learning ML ML

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 20, 2023

A novel approach to solve this complex security analytics scenario combines the ingestion and storage of security data using Amazon Security Lake and analyzing the security data with machine learning (ML) using Amazon SageMaker. Store new security logs in an S3 bucket and queue events in Amazon Simple Queue Service (Amazon SQS).

AWS

AWS ML ML Algorithm

Watch the Top ODSC Europe 2023 Virtual Sessions Here

ODSC - Open Data Science

JULY 14, 2023

Time Series Forecasting for Managers — All Forecasts Are Wrong but Some Are Useful Tanvir Ahmed Shaikh | Data Strategist (Director) | Genentech, Inc Time series forecasting remains an under-appreciated technique in data science education, often overshadowed by more popular machine learning methods.

Machine Learning

Machine Learning Machine Learning Apache Kafka Data Science

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 20, 2023

HPCC Systems — The Kit and Kaboodle for Big Data and Data Science Bob Foreman | Software Engineering Lead | LexisNexis/HPCC Join this session to learn how ECL can help you create powerful data queries through a comprehensive and dedicated data lake platform. Interested in attending an ODSC event?

AI

AI AI Data Science Machine Learning

Prompt Engineering Best Practices, the ODSC West 2024 Full Schedule, and LLM Fine-Tuning Strategies

ODSC - Open Data Science

OCTOBER 3, 2024

Building an Effective OSS Management Layer for Your Data Lake Ahead of her ODSC West session on OSS management layers, the speaker discusses how data lakes can benefit from this system.

Data Science

Data Science Data Lakes Data Scientist AI

What is Data Mining?

Pickl AI

FEBRUARY 21, 2023

Businesses require Data Scientists to perform Data Mining processes and invoke valuable data insights using different software and tools. What is Data Mining and how is it related to Data Science ? What is Data Mining? Why is Data Mining Important? Let’s learn from the following blog!

Data Mining

Data Mining Data Mining Data Mining Data Scientist

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. Storage Solutions: Secure and scalable storage options like Azure Blob Storage and Azure Data Lake Storage.

Azure

Azure Data Scientist Data Science Machine Learning

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. Data collection and ingestion The data collection and ingestion layer connects to all upstream data sources and loads the data into the data lake. This construct provides a fully event-driven workflow. He holds a Ph.D.

AWS

AWS Machine Learning Machine Learning Analytics

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Institute of Analytics The Institute of Analytics is a non-profit organization that provides data science and analytics courses, workshops, certifications, research, and development. The courses and workshops cover a wide range of topics, from basic data science concepts to advanced machine learning techniques.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

Enterprise data architects, data engineers, and business leaders from around the globe gathered in New York last week for the 3-day Strata Data Conference , which featured new technologies, innovations, and many collaborative ideas. 2) When data becomes information, many (incremental) use cases surface.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

With the recently launched Amazon Monitron Kinesis data export v2 feature , your OT team can stream incoming measurement data and inference results from Amazon Monitron via Amazon Kinesis to AWS Simple Storage Service (Amazon S3) to build an Internet of Things (IoT) data lake. Choose Create delivery stream.

AWS

AWS ML ML Database

Why Silicon Valley is the Go-To Place for Artificial Intelligence

ODSC - Open Data Science

AUGUST 7, 2023

Databricks Databricks is the developer of Delta Lake, an open-source project that brings reliability to data lakes for machine learning and other cases. Originally posted on OpenDataScience.com Read more data science articles on OpenDataScience.com , including tutorials and guides from beginner to advanced levels!

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Machine Learning Machine Learning

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

AWS Machine Learning Blog

NOVEMBER 22, 2023

Set up regular game days to test workload and team responses to simulated events. Learn from all operational failures – Drive improvement through lessons learned from all operational events and failures. By centralizing datasets within the flywheel’s dedicated Amazon S3 data lake, you ensure efficient data management.

AWS

AWS ML ML Machine Learning

Introducing watsonx: The future of AI for business

IBM Journey to AI blog

MAY 9, 2023

We are also building models trained on different types of business data, including code, time-series data, tabular data, geospatial data and IT events data. A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments.

AI

AI AI Data Warehouse Machine Learning

How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects

AWS Machine Learning Blog

JANUARY 13, 2023

Amazon Simple Storage Service (Amazon S3) object storage acts as a content data lake. TR built processes to securely access data from the content data lake to users’ experimentation workspaces while maintaining required authorization and auditability.

ML

ML ML AWS Data Scientist

What Is a Lakebase?

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Trending Sources

Streaming Machine Learning Without a Data Lake

How Rocket Companies modernized their data science solution on AWS

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data science vs data analytics: Unpacking the differences

Sneak peek at Microsoft Fabric price and its promising features

Drowning in Data? A Data Lake May Be Your Lifesaver

Real-Time ML with Spark and SBERT, AI Coding Assistants, Data Lake Vendors, and ODSC East…

Simplifying Time Series Analysis for Data Scientists

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Business analytics

Open Data Lakes, Safeguarding Images From AI, Free Data Viz Tools, and 50% Off ODSC East

How to Shift from Data Science to Data Engineering

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Podcast: Deciphering Data Architectures with James Serra

Shaping the future: OMRON’s data-driven journey with AWS

6 Remote AI Jobs to Look for in 2024

Beyond data: Cloud analytics mastery for business brilliance

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Pictures and Highlights from ODSC Europe 2023

Insights from the Gartner Data & Analytics Summit in London: Embracing Data Leadership and Strategy

What Does a Data Engineering Job Involve in 2024?

Evolvability — It’s Mostly About Data Contracts

Announcing the First Speakers for the 2024 Data Engineering Summit

Introducing the Topic Tracks for ODSC East 2024?—?Highlighting Gen AI, LLMs, and Responsible AI

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

MLOps Landscape in 2023: Top Tools and Platforms

Identify cybersecurity anomalies in your Amazon Security Lake data using Amazon SageMaker

Watch the Top ODSC Europe 2023 Virtual Sessions Here

Find Your AI Solutions at the ODSC West AI Expo

Prompt Engineering Best Practices, the ODSC West 2024 Full Schedule, and LLM Fine-Tuning Strategies

What is Data Mining?

Your Complete Roadmap to Become an Azure Data Scientist

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Find Your AI Solutions at the ODSC West AI Expo

3 Major Trends at Strata New York 2017

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Why Silicon Valley is the Go-To Place for Artificial Intelligence

Build well-architected IDP solutions with a custom lens – Part 1: Operational excellence

Introducing watsonx: The future of AI for business

How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects

Stay Connected