Blog, Data Engineering and Data Models

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

databricks

JUNE 18, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

AI

AI AI Data Science Artificial Intelligence

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Why You Need RAG to Stay Relevant as a Data Scientist

KDnuggets

JUNE 11, 2025

By Nate Rosidi , KDnuggets Market Trends & SQL Content Specialist on June 11, 2025 in Language Models Image by Author | Canva If you work in a data-related field, you should update yourself regularly. Data scientists use different tools for tasks like data visualization, data modeling, and even warehouse systems.

Data Scientist

Data Scientist Natural Language Processing Data Science Machine Learning

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. It offers full BI-Stack Automation, from source to data warehouse through to frontend.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

10 GitHub Awesome Lists for Data Science

Flipboard

JULY 1, 2025

Ideal for data scientists and engineers working with databases and complex data models. It also includes free machine learning books, courses, blogs, newsletters, and links to local meetups and communities.

Data Science

Data Science Machine Learning Machine Learning Natural Language Processing

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog

NOVEMBER 15, 2023

New big data architectures and, above all, data sharing concepts such as Data Mesh are ideal for creating a common database for many data products and applications. The Event Log Data Model for Process Mining Process Mining as an analytical system can very well be imagined as an iceberg.

Data Modeling

Data Modeling Data Models Business Intelligence Business Intelligence

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Tools like Python, SQL, Apache Spark, and Snowflake help engineers automate workflows and improve efficiency. Learning these tools is crucial for building scalable data pipelines. Thats where data engineering tools come in!

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Data Science Blog

JULY 23, 2023

Data Mesh on Azure Cloud with Databricks and Delta Lake for Applications of Business Intelligence, Data Science and Process Mining. With the concept of Data Mesh you will be able to access all your organizational internal and external data sources once and provides the data as several data models for all your analytical applications.

Data Science

Data Science Azure Power BI Business Intelligence

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

Streamlined Collaboration Among Teams Data Warehouse Systems in the cloud often involve cross-functional teams — data engineers, data scientists, and system administrators. This ensures that the data models and queries developed by data professionals are consistent with the underlying infrastructure.

Data Warehouse

Data Warehouse Azure SQL Database

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Accordingly, one of the most demanding roles is that of Azure Data Engineer Jobs that you might be interested in. The following blog will help you know about the Azure Data Engineering Job Description, salary, and certification course. How to Become an Azure Data Engineer?

Azure

Azure Data Engineering Data Engineering Data Engineering

Debunking the myths of Data Science: Clearing up top 7 misconceptions

Data Science Dojo

JANUARY 10, 2023

Data science myths are one of the main obstacles preventing newcomers from joining the field. In this blog, we bust some of the biggest myths shrouding the field. The US Bureau of Labor Statistics predicts that data science jobs will grow up to 36% by 2031. Data scientists only work on predictive modeling Another myth!

Data Science

Data Science Data Scientist Data Analyst Machine Learning

Most Common Use Cases of Data Engineering in Manufacturing

phData

DECEMBER 18, 2023

Data engineering refers to the design of systems that are capable of collecting, analyzing, and storing data at a large scale. In manufacturing, data engineering aids in optimizing operations and enhancing productivity while ensuring curated data that is both compliant and high in integrity.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Beyond The Data: Eugenia Pais, Sr. Data Engineer

phData

JULY 22, 2024

Welcome to Beyond the Data, a series that investigates the people behind the talent of phData. In this blog, we’re featuring Eugenia Pais, a Sr. Data Engineer at phData. Data Engineer? As a Senior Data Engineer, I wear many hats. Data Engineer appeared first on phData.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Unfolding the difference between data engineer, data scientist, and data analyst. Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. Read more to know.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Most Common Use Cases of Data Engineering in Healthcare

phData

AUGUST 11, 2023

Data engineering in healthcare is taking a giant leap forward with rapid industrial development. However, data collection and analysis have been commonplace in the healthcare sector for ages. Data Engineering in day-to-day hospital administration can help with better decision-making and patient diagnosis/prognosis.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

The GenAI Strategy Playbook

phData

JUNE 26, 2025

In turn, the same will happen in data engineering. Autonomous agents will re-architect the data lifecycle, from data modelling and infrastructure-as-code to platform migrations, CI/CD, governance, and ETL pipelines. With these foundations in place, unlocking value in unstructured data becomes easier.

ML

ML ML Data Engineering Data Engineering

Getting Started with AI in High-Risk Industries, How to Become a Data Engineer, and Query-Driven…

ODSC - Open Data Science

JANUARY 11, 2024

Getting Started with AI in High-Risk Industries, How to Become a Data Engineer, and Query-Driven Data Modeling How To Get Started With Building AI in High-Risk Industries This guide will get you started building AI in your organization with ease, axing unnecessary jargon and fluff, so you can start today.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

With the integration of SageMaker and Amazon DataZone, it enables collaboration between ML builders and data engineers for building ML use cases. ML builders can request access to data published by data engineers. Also, you can update the model’s deploy status.

ML

ML ML AWS Data Preparation

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

However, to fully harness the potential of a data lake, effective data modeling methodologies and processes are crucial. Data modeling plays a pivotal role in defining the structure, relationships, and semantics of data within a data lake. Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. Apache HBase was employed to offer real-time key-based access to data. Data is stored in HDFS and is accessed via Hive, which provides a tabular interface to the data and integrates with Spark SQL.

Data Science

Data Science AWS Hadoop Data Scientist

Data Scientist Job Description – What Companies Look For in 2025

Pickl AI

JUNE 5, 2025

As Indian companies across industries increasingly embrace data-driven decision-making, artificial intelligence (AI), and automation, the demand for skilled data scientists continues to surge. Validation techniques ensure models perform well on unseen data.

Data Scientist

Data Scientist Data Science Power BI Machine Learning

Getting Started with Data Selection

Mlearning.ai

MARCH 3, 2023

Data Engineering A data engineers start to simplification Introduction A lot of time folks start directly jumping into KPIs ( Key Performace Indicators) without understanding the need for those KPIs. I have met with clients who have dumped all the data they had and never figured out what they really wanted to achieve.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Looking Ahead: The Future of Data Preparation for Generative AI

Data Science Blog

AUGUST 22, 2024

Just as a chef’s masterpiece depends on the quality of the ingredients, your AI outcomes will depend on the data you prepare. Investing in your data can only lead to positive results. The post Looking Ahead: The Future of Data Preparation for Generative AI appeared first on Data Science Blog.

Data Preparation

Data Preparation Data Quality AI AI

The Data Engineer’s Roadmap

Dataversity

SEPTEMBER 28, 2022

Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. The journey to becoming a successful data engineer […].

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

Hosted on Amazon ECS with tasks run on Fargate, this platform streamlines the end-to-end ML workflow, from data ingestion to model deployment. This blog post delves into the details of this MLOps platform, exploring how the integration of these tools facilitates a more efficient and scalable approach to managing ML projects.

AWS

AWS Machine Learning Machine Learning ML

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Data scientists will typically perform data analytics when collecting, cleaning and evaluating data. By analyzing datasets, data scientists can better understand their potential use in an algorithm or machine learning model.

Data Science

Data Science Analytics Analytics Data Scientist

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Collectively, these modules address governance across various dimensions, such as infrastructure, data, model, and cost. Reference architecture modules The reference architecture comprises eight modules, each designed to solve a specific set of problems.

ML

ML ML AWS Data Lakes

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

ODSC West 2024 showcased a wide range of talks and workshops from leading data science, AI, and machine learning experts. This blog highlights some of the most impactful AI slides from the world’s best data science instructors, focusing on cutting-edge advancements in AI, data modeling, and deployment strategies.

Deep Learning

Deep Learning Deep Learning Data Science AI

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning Blog

MARCH 18, 2025

With 7 years of experience in developing data solutions, he possesses profound expertise in data visualization, data modeling, and data engineering. About the Author Rajendra Choudhary is a Sr. Business Analyst at Amazon. He is passionate about supporting customers by leveraging generative AIbased solutions.

SQL

SQL Database AI AI

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. It also lets you choose the right engine for the right workload at the right cost, potentially reducing your data warehouse costs by optimizing workloads.

AI

AI AI Data Warehouse ML

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

If you’re interested in learning more, we highly recommend checking out our comprehensive blog that covers this in much more detail. How to Connect Power BI to Snowflake Choose Import or Directquery Mode Carefully Power BI offers two main connection types when connecting to data sources, Import and DirectQuery.

Power BI

Power BI Analytics Analytics Azure

Data architecture strategy for data quality

IBM Journey to AI blog

JANUARY 5, 2023

The first generation of data architectures represented by enterprise data warehouse and business intelligence platforms were characterized by thousands of ETL jobs, tables, and reports that only a small group of specialized data engineers understood, resulting in an under-realized positive impact on the business.

Data Quality

Data Quality Data Lakes Data Warehouse Big Data

What Are dbt Artifacts

phData

FEBRUARY 8, 2024

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

Data Modeling

Data Modeling Data Models Data Warehouse Database

Data science vs. machine learning: What’s the difference?

IBM Journey to AI blog

JULY 6, 2023

It uses advanced tools to look at raw data, gather a data set, process it, and develop insights to create meaning. Areas making up the data science field include mining, statistics, data analytics, data modeling, machine learning modeling and programming. appeared first on IBM Blog.

Machine Learning

Machine Learning Machine Learning Data Science Big Data

Modernizing data science lifecycle management with AWS and Wipro

AWS Machine Learning Blog

JANUARY 5, 2024

By the end of the consulting engagement, the team had implemented the following architecture that effectively addressed the core requirements of the customer team, including: Code Sharing – SageMaker notebooks enable data scientists to experiment and share code with other team members.

AWS

AWS Data Science ML ML

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

In this blog post, I'll describe my analysis of Tableau's history to drive analytics innovation—in particular, I've identified six key innovation vectors through reflecting on the top innovations across Tableau releases. April 2018), which focused on users who do understand joins and curating federated data sources.

Tableau

Tableau ML ML Database

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

By changing the cost structure of collecting data, it increased the volume of data stored in every organization. Additionally, Hadoop removed the requirement to model or structure data when writing to a physical store. You did not have to understand or prepare the data to get it into Hadoop, so people rarely did.

Data Lakes

Data Lakes Hadoop Tableau Big Data

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

phData

AUGUST 10, 2023

In this blog, our focus will be on exploring the data lifecycle along with several Design Patterns, delving into their benefits and constraints. Data architects can leverage these patterns as starting points or reference models when designing and implementing data vault architectures.

SQL

SQL Data Observability Data Quality Data Pipeline

Where Does Fivetran Fit into The Modern Data Stack?

phData

JULY 17, 2023

Fivetran is a fully-automated, zero-maintenance data pipeline tool that automates the ETL process from data sources to your cloud warehouse. It eliminates the need for time-consuming data engineering tasks to maintain the pipeline and allows businesses to spend more time analyzing their data instead of maintaining it.

Data Warehouse

Data Warehouse Data Pipeline Cloud Data ETL

Alation Named a Leader in the IDC MarketScape for Data Catalogs (Again!)

Alation

AUGUST 16, 2022

But do they empower many user types to quickly find trusted data for a business decision or data model? Many data catalogs suffer from a lack of adoption because they are too technical. These include data analysts, stewards, business users , and data engineers. Subscribe to Alation's Blog.

Data Quality

Data Quality Data Governance Cloud Data Data Engineering

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Alation

SEPTEMBER 7, 2021

This is the last of the 4-part blog series. In the previous blog , we discussed how Alation provides a platform for data scientists and analysts to complete projects and analysis at speed. In this blog we will discuss how Alation helps minimize risk with active data governance. Subscribe to Alation's Blog.

Data Governance

Data Governance Data Scientist Data Quality Data Profiling

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

We document these custom models in Alation Data Catalog and publish common queries that other teams can use for operational use cases or reporting needs. Contact title mappings, which are buiilt in some of data models, are documented within our data catalog. Jason: How do you use these models?

Data Analyst

Data Analyst Data Scientist Analytics Analytics

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Blog - Everest Group Requirements gathering: ChatGPT can significantly simplify the requirements gathering phase by building quick prototypes of complex applications. GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API. The data would be interesting to analyze.

AI

AI AI Data Analysis Data Analysis

Introducing our New Book: Implementing MLOps in the Enterprise

Iguazio

DECEMBER 14, 2023

Who This Book Is For This book is for practitioners in charge of building, managing, maintaining, and operationalizing the ML process end to end: Data science / AI / ML leaders: Heads of Data Science, VPs of Advanced Analytics, AI Lead etc. Monitor the data, models, and applications to guarantee their availability and performance.

ML

ML ML Data Science Data Preparation

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Trending Sources

Why You Need RAG to Stay Relevant as a Data Scientist

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

10 GitHub Awesome Lists for Data Science

Object-centric Process Mining on Data Mesh Architectures

Best Data Engineering Tools Every Engineer Should Know

Data Mesh Architecture on Cloud for BI, Data Science and Process Mining

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Azure Data Engineer Jobs

Debunking the myths of Data Science: Clearing up top 7 misconceptions

Most Common Use Cases of Data Engineering in Manufacturing

Beyond The Data: Eugenia Pais, Sr. Data Engineer

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Most Common Use Cases of Data Engineering in Healthcare

The GenAI Strategy Playbook

Getting Started with AI in High-Risk Industries, How to Become a Data Engineer, and Query-Driven…

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

How Rocket Companies modernized their data science solution on AWS

Data Scientist Job Description – What Companies Look For in 2025

Getting Started with Data Selection

Looking Ahead: The Future of Data Preparation for Generative AI

The Data Engineer’s Roadmap

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Data science vs data analytics: Unpacking the differences

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

The Top AI Slides from ODSC West 2024

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

How to use foundation models and trusted governance to manage AI workflow risk

How to Optimize Power BI and Snowflake for Advanced Analytics

Data architecture strategy for data quality

What Are dbt Artifacts

Data science vs. machine learning: What’s the difference?

Modernizing data science lifecycle management with AWS and Wipro

Analyzing the history of Tableau innovation

Data Cataloging in the Data Lake: Alation + Kylo

Maximize the Power of dbt and Snowflake to Achieve Efficient and Scalable Data Vault Solutions

Where Does Fivetran Fit into The Modern Data Stack?

Alation Named a Leader in the IDC MarketScape for Data Catalogs (Again!)

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Generative AI in Software Development

Introducing our New Book: Implementing MLOps in the Enterprise

Stay Connected