Blog, Data Modeling and Data Pipeline

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Testing and Monitoring Data Pipelines: Part Two

Dataversity

JUNE 19, 2023

In part one of this article, we discussed how data testing can specifically test a data object (e.g., table, column, metadata) at one particular point in the data pipeline.

Data Pipeline

Data Pipeline Database Data Modeling Data Models

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Learning these tools is crucial for building scalable data pipelines. offers Data Science courses covering these tools with a job guarantee for career growth. Below are 20 essential tools every data engineer should know.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Iguazio

JULY 22, 2024

In this blog post, we introduce the joint MongoDB - Iguazio gen AI solution, which allows for the development and deployment of resilient and scalable gen AI applications. Iguazio capabilities: Structured and unstructured data pipelines for processing, versioning and loading documents.

AI

AI AI ML ML

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

If you will ask data professionals about what is the most challenging part of their day to day work, you will likely discover their concerns around managing different aspects of data before they get to graduate to the data modeling stage. This ensures that the data is accurate, consistent, and reliable.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Architect a mature generative AI foundation on AWS

Flipboard

MAY 30, 2025

Data quality is ownership of the consuming applications or data producers. Governance The two key areas of governance are model and data: Model governance Monitor model for performance, robustness, and fairness. Model versions should be managed centrally in a model registry.

AWS

AWS AI AI Database

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Iguazio

MAY 29, 2024

The 4 Gen AI Architecture Pipelines The four pipelines are: 1. The Data Pipeline The data pipeline is the foundation of any AI system. It's responsible for collecting and ingesting the data from various external sources, processing it and managing the data.

Data Pipeline

Data Pipeline AI AI ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment. This blog post delves into the details of this MLOps platform, exploring how the integration of these tools facilitates a more efficient and scalable approach to managing ML projects.

AWS

AWS Machine Learning Machine Learning ML

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Big Data Processing: Apache Hadoop, Apache Spark, etc.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Model versioning, lineage, and packaging : Can you version and reproduce models and experiments? Can you see the complete model lineage with data/models/experiments used downstream? It could help you detect and prevent data pipeline failures, data drift, and anomalies.

Machine Learning

Machine Learning Machine Learning ML ML

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

By analyzing datasets, data scientists can better understand their potential use in an algorithm or machine learning model. The data science lifecycle Data science is iterative, meaning data scientists form hypotheses and experiment to see if a desired outcome can be achieved using available data.

Data Science

Data Science Analytics Analytics Data Scientist

DataOps vs. DevOps: What’s the Difference?

Alation

AUGUST 3, 2021

It brings together business users, data scientists , data analysts, IT, and application developers to fulfill the business need for insights. DataOps then works to continuously improve and adjust data models, visualizations, reports, and dashboards to achieve business goals. Subscribe to Alation's Blog.

DataOps

DataOps Data Pipeline Data Analyst Analytics

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

But doing so requires significant engineering, quality data and overcoming risks. In this blog post, we show all the elements and practices you need to to take to productize LLMs and generative AI. You can watch the full talk this blog post is based on, which took place at ODSC West 2023, here. This helps cleanse the data.

Data Pipeline

Data Pipeline ML ML Data Warehouse

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

In order to fully leverage this vast quantity of collected data, companies need a robust and scalable data infrastructure to manage it. This is where Fivetran and the Modern Data Stack come in. The modern data stack is important because its suite of tools is designed to solve all of the core data challenges companies face.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

In this blog, our focus will be on exploring the data lifecycle along with several Design Patterns, delving into their benefits and constraints. Data architects can leverage these patterns as starting points or reference models when designing and implementing data vault architectures.

SQL

SQL Data Observability Data Quality Data Pipeline

Streamlining Process Configuration in Machine Learning with Hydra

Pickl AI

NOVEMBER 29, 2024

This blog highlights the importance of organised, flexible configurations in ML workflows and introduces Hydra. Machine Learning projects evolve rapidly, frequently introducing new data , models, and hyperparameters. It also simplifies managing configuration dependencies in Deep Learning projects and large-scale data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How to Ingest Salesforce Data Into Snowflake

phData

SEPTEMBER 13, 2023

To uncover this data, it needs to be consolidated, easily accessible, and living in a central location, which is precisely why many of our customers turn to the Snowflake Data Cloud. Why is it Important to Ingest Salesforce Data in Snowflake? This eliminates the need for manual data entry and reduces the risk of human error.

Tableau

Tableau Data Pipeline Data Silos Analytics

Implementing Gen AI for Financial Services

Iguazio

FEBRUARY 20, 2024

Risk, compliance, data privacy and escalating costs are just a few of the acute concerns that financial services companies are grappling with today. This includes management vision and strategy, resource commitment, data and tech and operating model alignment, robust risk management and change management. Read more here.

AI

AI AI Data Pipeline Analytics

What are Snowflake Dynamic Tables?

phData

NOVEMBER 2, 2023

Managing data pipelines efficiently is paramount for any organization. The Snowflake Data Cloud has introduced a groundbreaking feature that promises to simplify and supercharge this process: Snowflake Dynamic Tables. What are Snowflake Dynamic Tables?

Data Pipeline

Data Pipeline SQL Data Warehouse Data Engineer

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

If you’re interested in learning more, we highly recommend checking out our comprehensive blog that covers this in much more detail. How to Connect Power BI to Snowflake Choose Import or Directquery Mode Carefully Power BI offers two main connection types when connecting to data sources, Import and DirectQuery.

Power BI

Power BI Analytics Analytics Azure

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence.

AI

AI AI Data Warehouse ML

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

Best 8 data version control tools for 2023 (Source: DagsHub ) Introduction With business needs changing constantly and the growing size and structure of datasets, it becomes challenging to efficiently keep track of the changes made to the data, which leads to unfortunate scenarios such as inconsistencies and errors in data.

Machine Learning

Machine Learning Machine Learning Data Lakes Data Science

What Are dbt Artifacts

phData

FEBRUARY 8, 2024

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

Data Modeling

Data Modeling Data Models Data Warehouse Database

Who is a BI Developer: Role, Responsibilities & Skills

Pickl AI

JULY 3, 2023

It is the process of converting raw data into relevant and practical knowledge to help evaluate the performance of businesses, discover trends, and make well-informed choices. Data gathering, data integration, data modelling, analysis of information, and data visualization are all part of intelligence for businesses.

Business Intelligence

Business Intelligence Business Intelligence SQL Data Visualization

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Blog - Everest Group Requirements gathering: ChatGPT can significantly simplify the requirements gathering phase by building quick prototypes of complex applications. GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API.

AI

AI AI Data Analysis Data Analysis

Beyond The Data: Eugenia Pais, Sr. Data Engineer

phData

JULY 22, 2024

Welcome to Beyond the Data, a series that investigates the people behind the talent of phData. In this blog, we’re featuring Eugenia Pais, a Sr. Data Engineer at phData. I consciously chose to pivot away from general software development and specialize in Data Engineering.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Data architecture strategy for data quality

IBM Journey to AI blog

JANUARY 5, 2023

What does a modern data architecture do for your business? A modern data architecture like Data Mesh and Data Fabric aims to easily connect new data sources and accelerate development of use case specific data pipelines across on-premises, hybrid and multicloud environments.

Data Quality

Data Quality Data Lakes Data Warehouse Big Data

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

At the heart of this process lie ETL Tools—Extract, Transform, Load—a trio that extracts data, tweaks it, and loads it into a destination. Choosing the right ETL tool is crucial for smooth data management. This blog will delve into ETL Tools, exploring the top contenders and their roles in modern data integration.

ETL

ETL Data Quality Data Pipeline Data Warehouse

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

This blog post delves into the concepts of LLMOps and MLOps, explaining how and when to use each one. Data Pipeline - Manages and processes various data sources. ML Pipeline - Focuses on training, validation and deployment. Application Pipeline - Manages requests and data/model validations.

ML

ML ML Data Scientist AI

What Free Tools Pair Well With The Snowflake AI Data Cloud?

phData

OCTOBER 17, 2024

Getting your data into Snowflake, creating analytics applications from the data, and even ensuring your Snowflake account runs smoothly all require some sort of tool. In this blog, we’ll review some of the best free tools for use with Snowflake Data Cloud , what they can do for you, and how to use them without breaking the bank.

AI

AI AI SQL Data Quality

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Iguazio

DECEMBER 18, 2023

This will require investing resources in the entire AI and ML lifecycle, including building the data pipeline, scaling, automation, integrations, addressing risk and data privacy, and more. By doing so, you can ensure quality and production-ready models.

ML

ML ML AI AI

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

DataRobot Blog

SEPTEMBER 13, 2022

Companies at this stage will likely have a team of ML engineers dedicated to creating data pipelines, versioning data, and maintaining operations monitoring data, models & deployments. By now, data scientists have witnessed success optimizing internal operations and external offerings through AI.

Data Scientist

Data Scientist ML ML AI

Demystifying Time Series Database: A Comprehensive Guide

Pickl AI

JULY 8, 2024

Within this data ocean, a specific type holds immense value: time series data. This data captures measurements or events at specific points in time, essentially creating a digital record of how something changes over time. Buckle up as we navigate the intricacies of storing and analysing this dynamic data.

Database

Database Data Pipeline Machine Learning Machine Learning

Star Schema vs. Snowflake Schema: Comparing Dimensional Modeling Techniques

Pickl AI

JULY 25, 2024

Introduction Dimensional modelling is crucial for organising data to enhance query performance and reporting efficiency. Effective schema design is essential for optimising data retrieval and analysis in data warehousing. Must Read Blogs: Exploring the Power of Data Warehouse Functionality.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

How to Use Fivetran to Ingest Salesforce Data into Snowflake

phData

SEPTEMBER 25, 2024

Under this category, tools with pre-built connectors for popular data sources and visual tools for data transformation are better choices. This setting ensures that the data pipeline adapts to changes in the Source schema according to user-specific needs.

ETL

ETL Database Data Warehouse Analytics

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Iguazio

DECEMBER 18, 2023

This will require investing resources in the entire AI and ML lifecycle, including building the data pipeline, scaling, automation, integrations, addressing risk and data privacy, and more. By doing so, you can ensure quality and production-ready models.

ML

ML ML AI AI

ML Collaboration: Best Practices From 4 ML Teams

The MLOps Blog

DECEMBER 28, 2022

Team composition The team comprises data pipeline engineers, ML engineers, full-stack engineers, and data scientists. Organization Acquia Industry Software-as-a-service Team size Acquia built an ML team five years ago in 2017 and has a team size of 6.

ML

ML ML Data Scientist Machine Learning

The Data Engineer’s Roadmap

Dataversity

SEPTEMBER 28, 2022

Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. The journey to becoming a successful data engineer […]. In other words, job security is guaranteed.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

This past week, I had the pleasure of hosting Data Governance for Dummies author Jonathan Reichental for a fireside chat , along with Denise Swanson , Data Governance lead at Alation. These reports (combined with updates to the data governance roadmap, and your progress narrative) tell a great story. This is a very good thing.

Data Governance

Data Governance Data Quality Data Analyst Data Pipeline

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Enter dbt dbt provides SQL-centric transformations for your data modeling and transformations, which is efficient for scrubbing and transforming your data while being an easy skill set to hire for and develop within your teams. It should also enable easy sharing of insights across the organization.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

Why Should you Codify your Best Practices in dbt?

phData

JANUARY 7, 2025

Since you found your way to this blog, you must have already been familiar with the dbt Cloud. In this blog, we will explore the importance of codifying best practices in dbt and provide you with practical guidance on how to do so. Hence, referencing staging models for downstream models is considered to be legal.

SQL

SQL Data Warehouse Database Data Modeling

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

A typical machine learning pipeline with various stages highlighted | Source: Author Common types of machine learning pipelines In line with the stages of the ML workflow (data, model, and production), an ML pipeline comprises three different pipelines that solve different workflow stages.

ML

ML ML Machine Learning Machine Learning

Building Safe Enterprise AI Systems in a Databricks Ecosystem with Securiti’s Gencore AI

Data Science Dojo

APRIL 3, 2025

The combination of Databricks’ AI infrastructure and Securiti’s Gencore AI offers a security-first AI building framework, enabling enterprises to innovate while safeguarding sensitive data. Optimized Data Pipelines for AI Readiness AI models are only as good as the data they process.

AI

AI AI Data Pipeline Data Preparation

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Webinars

Trending Sources

Testing and Monitoring Data Pipelines: Part Two

Webinars

Best Data Engineering Tools Every Engineer Should Know

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Comparing Tools For Data Processing Pipelines

Architect a mature generative AI foundation on AWS

Future-Proofing Your App: Strategies for Building Long-Lasting Apps

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

MLOps Landscape in 2023: Top Tools and Platforms

Data science vs data analytics: Unpacking the differences

DataOps vs. DevOps: What’s the Difference?

Implementing GenAI in Practice

Where Does Fivetran Fit into The Modern Data Stack?

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Streamlining Process Configuration in Machine Learning with Hydra

How to Ingest Salesforce Data Into Snowflake

Implementing Gen AI for Financial Services

What are Snowflake Dynamic Tables?

How to Optimize Power BI and Snowflake for Advanced Analytics

How to use foundation models and trusted governance to manage AI workflow risk

Best 8 Data Version Control Tools for Machine Learning 2024

What Are dbt Artifacts

Who is a BI Developer: Role, Responsibilities & Skills

Generative AI in Software Development

Beyond The Data: Eugenia Pais, Sr. Data Engineer

Data architecture strategy for data quality

Top ETL Tools: Unveiling the Best Solutions for Data Integration

LLMOps vs. MLOps: Understanding the Differences

What Free Tools Pair Well With The Snowflake AI Data Cloud?

What Lays Ahead in 2024? AI/ML Predictions for the New Year

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

Demystifying Time Series Database: A Comprehensive Guide

Star Schema vs. Snowflake Schema: Comparing Dimensional Modeling Techniques

How to Use Fivetran to Ingest Salesforce Data into Snowflake

What Lays Ahead in 2024? AI/ML Predictions for the New Year

ML Collaboration: Best Practices From 4 ML Teams

The Data Engineer’s Roadmap

Data Governance for Dummies: Your Questions, Answered

The Ultimate Modern Data Stack Migration Guide

Why Should you Codify your Best Practices in dbt?

How to Build an End-To-End ML Pipeline

Building Safe Enterprise AI Systems in a Databricks Ecosystem with Securiti’s Gencore AI

Stay Connected