Data Lakes, Data Modeling and Data Science

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. It offers full BI-Stack Automation, from source to data warehouse through to frontend.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science AWS Hadoop Data Scientist

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Models Data Modeling Data Warehouse

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Analytics Analytics Data Scientist

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Data and governance foundations – This function uses a data mesh architecture for setting up and operating the data lake, central feature store, and data governance foundations to enable fine-grained data access. This framework considers multiple personas and services to govern the ML lifecycle at scale.

ML

ML ML AWS Data Lakes

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

It enables data engineers to define data models, manage dependencies, and perform automated testing, making it easier to ensure data quality and consistency. Fivetran: Fivetran is a cloud-based data integration platform that simplifies the process of loading data from various sources into a data warehouse or data lake.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

5 Recent Data Science and AI Webinars You Need to See

ODSC - Open Data Science

MARCH 23, 2023

Each month, ODSC has a few insightful webinars that touch on a range of issues that are important in the data science world, from use cases of machine learning models, to new techniques/frameworks, and more. This is due to how data lakes can become too large and complex. Watch on-demand here. Watch on-demand here.

Data Science

Data Science Data Lakes Machine Learning Machine Learning

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

The following points illustrates some of the main reasons why data versioning is crucial to the success of any data science and machine learning project: Storage space One of the reasons of versioning data is to be able to keep track of multiple versions of the same data which obviously need to be stored as well.

Machine Learning

Machine Learning Machine Learning Data Lakes Data Science

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

ODSC - Open Data Science

MARCH 30, 2023

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a GPU to a Container Using Azure ML to Train a Serengeti Data Model for Animal Identification In this article, we will cover how you can train a model using Notebooks in Azure Machine Learning Studio.

Azure

Azure ML ML Data Modeling

Using Azure ML to Train a Serengeti Data Model for Animal Identification

ODSC - Open Data Science

MAY 8, 2023

To get the data, you will need to follow the instructions in the article: Create a Data Solution on Azure Synapse Analytics with Snapshot Serengeti — Part 1 — Microsoft Community Hub, where you will load data into Azure Data Lake via Azure Synapse. Lastly, upload the data from Azure Subscription.

Azure

Azure ML ML Data Modeling

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

ODSC West 2024 showcased a wide range of talks and workshops from leading data science, AI, and machine learning experts. This blog highlights some of the most impactful AI slides from the world’s best data science instructors, focusing on cutting-edge advancements in AI, data modeling, and deployment strategies.

Deep Learning

Deep Learning Deep Learning Data Science AI

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Key features of cloud analytics solutions include: Data models , Processing applications, and Analytics models. Data models help visualize and organize data, processing applications handle large datasets efficiently, and analytics models aid in understanding complex data sets, laying the foundation for business intelligence.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Text, images, audio, and videos are common examples of unstructured data. The steps of the workflow are as follows: Integrated AI services extract data from the unstructured data.

AWS

AWS ML ML Analytics

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Google Cloud Vertex AI Google Cloud Vertex AI provides a unified environment for both automated model development with AutoML and custom model training using popular frameworks. Qwak Qwak is a fully-managed, accessible, and reliable ML platform to develop and deploy models and monitor the entire machine learning pipeline.

Machine Learning

Machine Learning Machine Learning ML ML

MLOps and DevOps: Why Data Makes It Different

O'Reilly Media

OCTOBER 19, 2021

We need robust versioning for data, models, code, and preferably even the internal state of applications—think Git on steroids to answer inevitable questions: What changed? Adapted from the book Effective Data Science Infrastructure. Data is at the core of any ML project, so data infrastructure is a foundational concern.

ML

ML ML Data Scientist AWS

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

This article is an excerpt from the book Expert Data Modeling with Power BI, Third Edition by Soheil Bakhshi, a completely updated and revised edition of the bestselling guide to Power BI and data modeling. No-code/low-code experience using a diagram view in the data preparation layer similar to Dataflows.

Power BI

Power BI Data Warehouse ETL Data Preparation

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

As they attempt to put machine learning models into production, data science teams encounter many of the same hurdles that plagued data analytics teams in years past: Finding trusted, valuable data is time-consuming. Obstacles, such as user roles, permissions, and approval request prevent speedy data access.

Data Scientist

Data Scientist Data Quality Data Science Data Analyst

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Institute of Analytics The Institute of Analytics is a non-profit organization that provides data science and analytics courses, workshops, certifications, research, and development. The courses and workshops cover a wide range of topics, from basic data science concepts to advanced machine learning techniques.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

Watch Now: The Top West 2024 Recordings

ODSC - Open Data Science

NOVEMBER 18, 2024

Introduction to Containers for Data Science/Data Engineering Michael A Fudge | Professor of Practice, MSIS Program Director | Syracuse University’s iSchool In this hands-on session, you’ll learn how to leverage the benefits of containers for DS and data engineering workflows.

Deep Learning

Deep Learning Deep Learning Database Data Science

Where Do Data Catalogs Fit in Metadata Management?

Alation

FEBRUARY 13, 2020

Just as you need data about finances for effective financial management, you need data about data (metadata) for effective data management. You can’t manage data without metadata. But data catalogs do much more. Figure 1 shows a logical data model that represents typical metadata content of a data catalog.

Data Lakes

Data Lakes Data Governance Data Science Data Analyst

Data architecture strategy for data quality

IBM Journey to AI blog

JANUARY 5, 2023

The right data architecture can help your organization improve data quality because it provides the framework that determines how data is collected, transported, stored, secured, used and shared for business intelligence and data science use cases. Perform data quality monitoring based on pre-configured rules.

Data Quality

Data Quality Data Lakes Data Warehouse Big Data

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 5, 2023

The portal combines these predictive alerts with other insights we derive from our AWS-based data lake in order to give our dealers more clarity into equipment health across their entire client base. Dan Volk is a Data Scientist at the AWS Generative AI Innovation Center.

AWS

AWS ML ML Machine Learning

How to Better Plan Your Snowflake Migration

phData

SEPTEMBER 26, 2023

Sources The sources involved could influence or determine the options available for the data ingestion tool(s). These could include other databases, data lakes, SaaS applications (e.g. Data flows from the current data platform to the destination. Learn more about how a data model is chosen!

SQL

SQL Database ETL Data Modeling

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

AI

AI AI Data Warehouse ML

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

What should you be looking for?

Apache Kafka

Apache Kafka AI AI Machine Learning

What is Data Integration in Data Mining with Example?

Pickl AI

JUNE 28, 2023

Data cleaning, normalization, and reformatting to match the target schema is used. · Data Loading It is the final step where transformed data is loaded into a target system, such as a data warehouse or a data lake. It ensures that the integrated data is available for analysis and reporting.

Data Mining

Data Mining Data Mining Data Mining ETL

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

AWS Machine Learning Blog

JUNE 22, 2023

In LnW Connect, an encryption process was designed to provide a secure and reliable mechanism for the data to be brought into an AWS data lake for predictive modeling. About the authors Aruna Abeyakoon is the Senior Director of Data Science & Analytics at Light & Wonder Land-based Gaming Division.

AWS

AWS ML ML Machine Learning

Exploring the Power of Data Warehouse Functionality

Pickl AI

JUNE 11, 2024

This might involve data validation rules, data cleansing procedures, and ongoing monitoring to maintain data integrity. Optimize Data Modelling The way data is structured within the warehouse significantly impacts its usability and efficiency.

Data Warehouse

Data Warehouse ETL Data Mining Data Mining

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Therefore, you’ll be empowered to truncate and reprocess data if bugs are detected and provide an excellent raw data source for data scientists.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

What frameworks and operating models have you seen work well? The firms that get data governance and management “right” bring people together and leverage a set of capabilities: (1) Agile; (2) Six sigma; (3) data science; and (4) project management tools. Establishing a solid vision and mission is key.

Data Governance

Data Governance Data Quality Data Analyst Data Pipeline

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

A typical machine learning pipeline with various stages highlighted | Source: Author Common types of machine learning pipelines In line with the stages of the ML workflow (data, model, and production), an ML pipeline comprises three different pipelines that solve different workflow stages. They include: 1 Data (or input) pipeline.

ML

ML ML Machine Learning Machine Learning

Data Science Current

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

How Rocket Companies modernized their data science solution on AWS

Webinars

Trending Sources

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Webinars

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Data science vs data analytics: Unpacking the differences

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Essential data engineering tools for 2023: Empowering for management and analysis

5 Recent Data Science and AI Webinars You Need to See

Best 8 Data Version Control Tools for Machine Learning 2024

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

Using Azure ML to Train a Serengeti Data Model for Animal Identification

The Top AI Slides from ODSC West 2024

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Beyond data: Cloud analytics mastery for business brilliance

Unstructured data management and governance using AWS AI/ML and analytics services

MLOps Landscape in 2023: Top Tools and Platforms

MLOps and DevOps: Why Data Makes It Different

Introduction to Power BI Datamarts

Discover the Most Important Fundamentals of Data Engineering

The Data Scientist’s Guide to the Data Catalog

Find Your AI Solutions at the ODSC West AI Expo

Watch Now: The Top West 2024 Recordings

Where Do Data Catalogs Fit in Metadata Management?

Data architecture strategy for data quality

How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker

How to Better Plan Your Snowflake Migration

How to use foundation models and trusted governance to manage AI workflow risk

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

What is Data Integration in Data Mining with Example?

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

Exploring the Power of Data Warehouse Functionality

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Data Governance for Dummies: Your Questions, Answered

How to Build an End-To-End ML Pipeline

Stay Connected