Data Pipeline, Data Quality and Events

Data integrity vs. data quality: Is there a difference?

IBM Journey to AI blog

JULY 13, 2023

When we talk about data integrity, we’re referring to the overarching completeness, accuracy, consistency, accessibility, and security of an organization’s data. Together, these factors determine the reliability of the organization’s data.

Data Quality

Data Quality Data Profiling Data Governance Machine Learning

Data Quality Framework: What It Is, Components, and Implementation

DagsHub

AUGUST 23, 2024

As such, the quality of their data can make or break the success of the company. This article will guide you through the concept of a data quality framework, its essential components, and how to implement it effectively within your organization. What is a data quality framework?

Data Quality

Data Quality Data Governance Machine Learning Machine Learning

Why You Need Data Observability to Improve Data Quality

Precisely

MAY 4, 2023

Systems and data sources are more interconnected than ever before. A broken data pipeline might bring operational systems to a halt, or it could cause executive dashboards to fail, reporting inaccurate KPIs to top management. Is your data governance structure up to the task? Read What Is Data Observability?

Data Observability

Data Observability Data Quality Data Pipeline Machine Learning

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

phData

OCTOBER 25, 2024

“Quality over Quantity” is a phrase we hear regularly in life, but when it comes to the world of data, we often fail to adhere to this rule. Data Quality Monitoring implements quality checks in operational data processes to ensure that the data meets pre-defined standards and business rules.

Data Quality

Data Quality Data Pipeline Data Governance Database

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Learning these tools is crucial for building scalable data pipelines. offers Data Science courses covering these tools with a job guarantee for career growth. Below are 20 essential tools every data engineer should know.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

Jacomo Corbo is a Partner and Chief Scientist, and Bryan Richardson is an Associate Partner and Senior Data Scientist, for QuantumBlack AI by McKinsey. They presented “Automating Data Quality Remediation With AI” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022. That is still in flux and being worked out.

Data Quality

Data Quality ML ML AI

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Data quality control: Robust dataset labeling and annotation tools incorporate quality control mechanisms such as inter-annotator agreement analysis, review workflows, and data validation checks to ensure the accuracy and reliability of annotations. Data monitoring tools help monitor the quality of the data.

Machine Learning

Machine Learning Machine Learning ML ML

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Lambda enables serverless, event-driven data processing tasks, allowing for real-time transformations and calculations as data arrives. Step Functions complements this by orchestrating complex workflows, coordinating multiple Lambda functions, and managing error handling for sophisticated data processing pipelines.

AWS

AWS Data Governance Data Silos SQL

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

Jacomo Corbo is a Partner and Chief Scientist, and Bryan Richardson is an Associate Partner and Senior Data Scientist, for QuantumBlack AI by McKinsey. They presented “Automating Data Quality Remediation With AI” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022. That is still in flux and being worked out.

Data Quality

Data Quality ML ML AI

McKinsey QuantumBlack on automating data quality remediation with AI

Snorkel AI

JUNE 22, 2023

Jacomo Corbo is a Partner and Chief Scientist, and Bryan Richardson is an Associate Partner and Senior Data Scientist, for QuantumBlack AI by McKinsey. They presented “Automating Data Quality Remediation With AI” at Snorkel AI’s The Future of Data-Centric AI Summit in 2022. That is still in flux and being worked out.

Data Quality

Data Quality ML ML AI

ODSC West 2023 Recap in Pictures

ODSC - Open Data Science

DECEMBER 5, 2023

You can see our photos from the event here , and be sure to follow our YouTube for virtual highlights from the conference as well. Over in San Francisco, we had a keynote for each day of the event. Other Events Aside from networking events and all of our sessions, we had a few other special events. What’s next?

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

If the question was Whats the schedule for AWS events in December?, AWS usually announces the dates for their upcoming # re:Invent event around 6-9 months in advance. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Guide to Digital Transformation: Data-first Architecture

Dataversity

APRIL 30, 2021

The goal of digital transformation remains the same as ever – to become more data-driven. We have learned how to gain a competitive advantage by capturing business events in data. Events are data snap-shots of complex activity sourced from the web, customer systems, ERP transactions, social media, […].

Data Pipeline

Data Pipeline Data Warehouse Data Governance Data Quality

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

In this post, we discuss how to bring data stored in Amazon DocumentDB into SageMaker Canvas and use that data to build ML models for predictive analytics. Without creating and maintaining data pipelines, you will be able to power ML models with your unstructured data stored in Amazon DocumentDB.

Machine Learning

Machine Learning Machine Learning AWS ML

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Effective data governance enhances quality and security throughout the data lifecycle. What is Data Engineering? Data Engineering is designing, constructing, and managing systems that enable data collection, storage, and analysis. ETL is vital for ensuring data quality and integrity.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

Tools like Git and Jenkins are not suited for managing data. By capturing metadata, such as transformations, storage configurations, versions, owners, lineage, statistics, data quality, and other relevant attributes of the data, a feature platform can address these issues. This is where a feature platform comes in handy.

Machine Learning

Machine Learning Machine Learning ML ML

How Data Observability Helps to Build Trusted Data

Precisely

SEPTEMBER 18, 2023

It enables a big-picture understanding of the health of your organization’s data through continuous AI/ML-enabled monitoring – detecting anomalies throughout the data pipeline and preventing data downtime. Why is data observability so important? Longer-term data trends also require attention.

Data Observability

Data Observability Data Quality Data Pipeline DataOps

Announcing the 2024 Data Engineering & Ai X Innovation Summits

ODSC - Open Data Science

JANUARY 2, 2024

We couldn’t be more excited to announce two events that will be co-located with ODSC East in Boston this April: The Data Engineering Summit and the Ai X Innovation Summit. These two co-located events represent an opportunity to dive even deeper into the topics and trends shaping these disciplines. Learn more about them below.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

What Does a Data Engineering Job Involve in 2024?

ODSC - Open Data Science

JANUARY 30, 2024

Not only does it involve the process of collecting, storing, and processing data so that it can be used for analysis and decision-making, but these professionals are responsible for building and maintaining the infrastructure that makes this possible; and so much more. Think of data engineers as the architects of the data ecosystem.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Know Before You Go: Precisely at Confluent’s Current 2023

Precisely

SEPTEMBER 12, 2023

As a proud member of the Connect with Confluent program , we help organizations going through digital transformation and IT infrastructure modernization break down data silos and power their streaming data pipelines with trusted data. Let’s cover some additional information to know before attending.

Data Silos

Data Silos Apache Kafka Data Pipeline Data Quality

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Iguazio

MAY 29, 2024

The 4 Gen AI Architecture Pipelines The four pipelines are: 1. The Data Pipeline The data pipeline is the foundation of any AI system. It's responsible for collecting and ingesting the data from various external sources, processing it and managing the data.

Data Pipeline

Data Pipeline AI AI ML

What Is Data Observability and Why You Need It?

Precisely

DECEMBER 12, 2023

Systems and data sources are more interconnected than ever before. A broken data pipeline might bring operational systems to a halt, or it could cause executive dashboards to fail, reporting inaccurate KPIs to top management. The application of this concept to data is relatively new. Complexity leads to risk.

Data Observability

Data Observability Data Quality Data Pipeline Machine Learning

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. Interested in attending an ODSC event?

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

The Role of RTOS in the Future of Big Data Processing

ODSC - Open Data Science

JUNE 19, 2023

As the name suggests, real-time operating systems (RTOS) handle real-time applications that undertake data and event processing under a strict deadline. It is also important to establish data quality standards and strict access controls.

Big Data

Big Data Big Data Artificial Intelligence Artificial Intelligence

Using Agile Data Stacks To Enable Flexible Decision Making In Uncertain Economic Times

Precisely

FEBRUARY 2, 2023

Business managers are faced with plotting the optimal course in the face of these evolving events. Pipelines must have robust data integration capabilities that integrate data from multiple data silos, including the extensive list of applications used throughout the organization, databases and even mainframes.

Data Pipeline

Data Pipeline Data Silos Database Data Observability

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Alation

MAY 31, 2023

If you’re not familiar with DGIQ, it’s the world’s most comprehensive event dedicated to, you guessed it, data governance and information quality. This year’s DGIQ West will host tutorials, workshops, seminars, general conference sessions, and case studies for global data leaders.

Data Governance

Data Governance DataOps Data Pipeline Business Intelligence

The Data Integration Solution Checklist: Top 10 Considerations

Precisely

MAY 13, 2024

The right data integration solution helps you streamline operations, enhance data quality, reduce costs, and make better data-driven decisions. It synthesizes all the metadata around your organization’s data assets and arranges the information into a simple, easy-to-understand format.

Data Governance

Data Governance Data Pipeline Cloud Data Data Quality

Announcing the ODSC West 2023 Preliminary Schedule

ODSC - Open Data Science

SEPTEMBER 20, 2023

Monday’s sessions will cover a wide range of topics, from Generative AI and LLMs to MLOps and Data Visualization. Finally, get ready for some All Hallows Eve fun with Halloween Data After Dark , featuring a costume contest, candy, and more. There will also be an in-person career expo where you can find your next job in data science!

Data Wrangling

Data Wrangling Data Science Machine Learning Machine Learning

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

Setting up the Information Architecture Setting up an information architecture during migration to Snowflake poses challenges due to the need to align existing data structures, types, and sources with Snowflake’s multi-cluster, multi-tier architecture. Moving historical data from a legacy system to Snowflake poses several challenges.

SQL

SQL Database Data Quality Data Warehouse

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

The DAGs can then be scheduled to run at specific intervals or triggered when an event occurs. It even offers a user-friendly interface to visualize the pipelines and monitor progress. The Data Source Tool can automate scanning DDL and profiling tables between source and target, comparing them, and then reporting findings.

AI

AI AI SQL Data Quality

What is Snowflake Horizon?

phData

AUGUST 5, 2024

Horizon addresses key aspects of data governance, including: Compliance Security Access Privacy Interoperability Throughout the remainder of this blog, we will dive deeper into each of the above components and take a look at the ways in which Horizon can help. We will begin with compliance.

Data Governance

Data Governance Data Quality Data Lakes ML

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

A 2019 survey by McKinsey on global data transformation revealed that 30 percent of total time spent by enterprise IT teams was spent on non-value-added tasks related to poor data quality and availability. It truly is an all-in-one data lake solution. Roxie then consolidates that data and presents the results.

Data Lakes

Data Lakes Clustering Big Data Big Data

Four starting points to transform your organization into a data-driven enterprise

IBM Journey to AI blog

JANUARY 17, 2023

Due to the convergence of events in the data analytics and AI landscape, many organizations are at an inflection point. From there, it can be easily accessed via dashboards by data consumers or those building into a data product.

Data Governance

Data Governance Data Science AI AI

Top 5 Fivetran Connectors For Financial Services

phData

JANUARY 24, 2024

In addition to the Application type, Fivetran provides connectors for databases, files, events, and functions. Salesforce – The Salesforce (SFDC) connector streamlines customer relationship management (CRM) data integration. Scalability Considerations: Ensure your data integration setup is scalable and ready for future growth.

Data Warehouse

Data Warehouse Data Pipeline Data Governance Cloud Data

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Snorkel AI wrapped the second day of our The Future of Data-Centric AI virtual conference by showcasing how Snorkel’s data-centric platform has enabled customers to succeed, taking a deep look at Snorkel Flow’s capabilities, and announcing two new solutions. You need to find a place to park your data.

AI

AI AI Data Scientist Machine Learning

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Snorkel AI wrapped the second day of our The Future of Data-Centric AI virtual conference by showcasing how Snorkel’s data-centric platform has enabled customers to succeed, taking a deep look at Snorkel Flow’s capabilities, and announcing two new solutions. You need to find a place to park your data.

AI

AI AI Data Scientist Machine Learning

What Is Data Modernization? 5 Benefits Worth Knowing

Alation

APRIL 19, 2022

Access the resources your data applications need — no more, no less. Data Pipeline Automation. Consolidate all data sources to automate pipelines for processing in a single repository. Efficient Data Processing. To use data, you need the ability to collect and correlate it efficiently.

Data Governance

Data Governance Cloud Data Database Data Silos

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

For small-scale/low-value deployments, there might not be many items to focus on, but as the scale and reach of deployment go up, data governance becomes crucial. This includes data quality, privacy, and compliance. The data pipelines can be scheduled as event-driven or be run at specific intervals the users choose.

AWS

AWS ETL ML ML

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Data Engineer Data engineers are the authors of the infrastructure that stores, processes, and manages the large volumes of data an organization has. The main aspect of their profession is the building and maintenance of data pipelines, which allow for data to move between sources. Well then, you’re in luck.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

AI in Time Series Forecasting

Pickl AI

DECEMBER 16, 2024

This step includes: Identifying Data Sources: Determine where data will be sourced from (e.g., Ensuring Time Consistency: Ensure that the data is organized chronologically, as time order is crucial for time series analysis. This is vital for agriculture, disaster management, and event planning.

AI

AI AI Machine Learning Machine Learning

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

Continuous monitoring of resources, data, and metrics. Data Pipeline - Manages and processes various data sources. ML Pipeline - Focuses on training, validation and deployment. Application Pipeline - Manages requests and data/model validations. Collecting feedback for further tuning.

ML

ML ML Data Scientist AI

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

DagsHub

JANUARY 14, 2025

Similar Audio: Audio recordings of the same event or sound but with different microphone placements or background noise. It would help to improve the process in future by creating a clear audit trail of how duplicate records are identified and handled throughout the data pipeline.

Machine Learning

Machine Learning Machine Learning Clustering Algorithm

Data integrity vs. data quality: Is there a difference?

Data Quality Framework: What It Is, Components, and Implementation

Webinars

Trending Sources

Why You Need Data Observability to Improve Data Quality

Webinars

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

Best Data Engineering Tools Every Engineer Should Know

McKinsey QuantumBlack on automating data quality remediation with AI

MLOps Landscape in 2023: Top Tools and Platforms

Shaping the future: OMRON’s data-driven journey with AWS

McKinsey QuantumBlack on automating data quality remediation with AI

McKinsey QuantumBlack on automating data quality remediation with AI

ODSC West 2023 Recap in Pictures

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Guide to Digital Transformation: Data-first Architecture

Comparing Tools For Data Processing Pipelines

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Discover the Most Important Fundamentals of Data Engineering

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

How Data Observability Helps to Build Trusted Data

Announcing the 2024 Data Engineering & Ai X Innovation Summits

What Does a Data Engineering Job Involve in 2024?

Know Before You Go: Precisely at Confluent’s Current 2023

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

What Is Data Observability and Why You Need It?

Find Your AI Solutions at the ODSC West AI Expo

The Role of RTOS in the Future of Big Data Processing

Using Agile Data Stacks To Enable Flexible Decision Making In Uncertain Economic Times

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

The Data Integration Solution Checklist: Top 10 Considerations

Announcing the ODSC West 2023 Preliminary Schedule

What are the Biggest Challenges with Migrating to Snowflake?

What Free Tools Pair Well With The Snowflake AI Data Cloud?

What is Snowflake Horizon?

Drowning in Data? A Data Lake May Be Your Lifesaver

Four starting points to transform your organization into a data-driven enterprise

Top 5 Fivetran Connectors For Financial Services

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

What Is Data Modernization? 5 Benefits Worth Knowing

How to Manage Unstructured Data in AI and Machine Learning Projects

How to Build a CI/CD MLOps Pipeline [Case Study]

What Industries are Hiring for Different Jobs in AI

AI in Time Series Forecasting

LLMOps vs. MLOps: Understanding the Differences

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

Stay Connected