Analytics, Data Engineer and Data Pipeline

Build Your Own Simple Data Pipeline with Python and Docker

KDnuggets

JULY 17, 2025

By Cornellius Yudha Wijaya , KDnuggets Technical Content Specialist on July 17, 2025 in Data Science Image by Author | Ideogram Data is the asset that drives our work as data professionals. Thus, securing suitable data is crucial for any data professional, and data pipelines are the systems designed for this purpose.

Data Pipeline

Data Pipeline Python ETL Natural Language Processing

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

KDnuggets

JULY 15, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on July 15, 2025 in Data Science Image by Author Delivering the right data at the right time is a primary need for any organization in the data-driven society. But lets be honest: creating a reliable, scalable, and maintainable data pipeline is not an easy task.

Data Pipeline

Data Pipeline Natural Language Processing Data Science SQL

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

databricks

JULY 15, 2025

Get a Demo Login Try Databricks Blog / Platform / Article What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads Explore the latest Azure Databricks capabilities designed to help organizations simplify governance, modernize data pipelines, and power AI-native applications on a secure, open platform.

Azure

Azure Power BI AI AI

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Data Pipelines For AI Agents: Building The Backbone Of Intelligent Automation

Flipboard

JULY 7, 2025

Shinoy Vengaramkode Bhaskaran, Senior Big Data Engineering Manager, Zoom Communications Inc. As AI agents become more intelligent, autonomous and pervasive across industries—from predictive customer support to automated infrastructure management—their performance hinges on a single foundational …

Data Pipeline

Data Pipeline Big Data Big Data Data Engineer

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Data Science Dojo

JULY 24, 2025

Instead of sweating the syntax, you describe the “ vibe ” of what you want—be it a data pipeline, a web app, or an analytics automation script—and frameworks like Replit, GitHub Copilot, Gemini Code Assist, and others do the heavy lifting. Learn more about LLMs and their applications in this Data Science Dojo guide.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Data engineer

Dataconomy

JUNE 12, 2025

Data engineers are the unsung heroes of the data-driven world, laying the essential groundwork that allows organizations to leverage their data for enhanced decision-making and strategic insights. What is a data engineer?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

databricks

JULY 24, 2025

Over the past few months, we’ve introduced exciting updates to Lakeflow Jobs (formerly known as Databricks Workflows) to improve data orchestration and optimize workflow performance. More controlled and efficient data flows Our orchestrator is constantly being enhanced with new features.

Data Pipeline

Data Pipeline Data Engineering Data Engineering Data Engineering

Innovations in Analytics: Elevating Data Quality with GenAI

Towards AI

OCTOBER 31, 2024

Data analytics has become a key driver of commercial success in recent years. The ability to turn large data sets into actionable insights can mean the difference between a successful campaign and missed opportunities. According to Gartner’s Hype Cycle, GenAI is at the peak, showcasing its potential to transform analytics.¹

Data Quality

Data Quality Analytics Analytics Clean Data

8 Ways to Scale your Data Science Workloads

KDnuggets

JULY 22, 2025

The BigQuery Sandbox removes that barrier, letting you query up to 1 terabyte of data per month. It’s a great, no-cost way to start learning and experimenting with large-scale analytics. As a data scientist, you can access your BigQuery Sandbox from a Colab notebook. No credit card required.

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python

KDnuggets

JUNE 24, 2025

🔗 Link to the code on GitHub Why Data Cleaning Pipelines? Think of data pipelines like assembly lines in manufacturing. Wrapping Up Data pipelines arent just about cleaning individual datasets. Each step performs a specific function, and the output from one step becomes the input for the next.

Python

Python Natural Language Processing Data Science Machine Learning

10 Python Math & Statistical Analysis One-Liners

KDnuggets

JULY 16, 2025

By subscribing you accept KDnuggets Privacy Policy Leave this field empty if youre human: Get the FREE ebook The Great Big Natural Language Processing Primer and The Complete Collection of Data Science Cheat Sheets along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

Python

Python Natural Language Processing Data Science Machine Learning

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

KDnuggets

JUNE 26, 2025

This transforms your workflow into a distribution system where quality reports are automatically sent to project managers, data engineers, or clients whenever you analyze a new dataset. This proactive approach helps you identify data pipeline issues before they impact downstream analysis or model performance.

Data Quality

Data Quality Data Science Natural Language Processing Machine Learning

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

databricks

JULY 23, 2025

As your managed pipelines run, we take care of schema evolution, seamless third-party API upgrades, and comprehensive observability with built-in alerts. As part of Lakeflow Connect, Zerobus is also unified with the Databricks Platform, so you can leverage broader analytics and AI capabilities right away.

Database

Database Data Warehouse Data Engineering Data Engineering

10 Surprising Things You Can Do with Python’s collections Module

KDnuggets

JULY 17, 2025

By subscribing you accept KDnuggets Privacy Policy Leave this field empty if youre human: Get the FREE ebook The Great Big Natural Language Processing Primer and The Complete Collection of Data Science Cheat Sheets along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

Natural Language Processing

Natural Language Processing Data Science Python Machine Learning

Why Python Pros Avoid Loops: A Gentle Guide to Vectorized Thinking

KDnuggets

JULY 24, 2025

Top Posts 7 Python Web Development Frameworks for Data Scientists Build Your Own Simple Data Pipeline with Python and Docker 10 GitHub Repositories for Machine Learning Projects 10 Python One-Liners for JSON Parsing and Processing What Does Python’s __slots__ Actually Do?

Python

Python Natural Language Processing Data Science Machine Learning

A Complete Guide to Matplotlib: From Basics to Advanced Plots

KDnuggets

JULY 21, 2025

Generative AI: A Self-Study Roadmap Get the FREE ebook The Great Big Natural Language Processing Primer and The Complete Collection of Data Science Cheat Sheets along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

10 Python One-Liners for JSON Parsing and Processing

KDnuggets

JULY 22, 2025

API, Database, Campaign, Analytics, Frontend, Testing, Outreach, CRM] # Conclusion These Python one-liners show how useful Python is for JSON data manipulation. This one-liner extracts and combines elements from nested lists, creating a single flat structure thats easier to work with in subsequent operations.

Python

Python Natural Language Processing Data Science Machine Learning

What’s New in Lakeflow Declarative Pipelines: July 2025

databricks

JULY 22, 2025

This contribution extends Spark’s declarative model from individual queries to full pipelines, letting developers define what their pipelines should do while Spark handles how to do it. Finally, we announced Lakeflow Designer , a no-code experience for building data pipelines. Preview coming soon.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Benefits of Using LiteLLM for Your LLM Apps

KDnuggets

JULY 23, 2025

A more advanced cost-tracking implementation will also allow users to set a spending budget and limit , while also connecting the LiteLLM cost usage information to an analytics dashboard to more easily aggregate information. Users can also define custom pricing for models (per token or per second) to calculate costs accurately.

Natural Language Processing

Natural Language Processing Data Science Python Machine Learning

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

Flipboard

JULY 16, 2025

By subscribing you accept KDnuggets Privacy Policy Leave this field empty if youre human: Get the FREE ebook The Great Big Natural Language Processing Primer and The Complete Collection of Data Science Cheat Sheets along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

10 Free Online Courses to Master Python in 2025

KDnuggets

JULY 24, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 10 Free Online Courses to Master Python in 2025 How can you master Python for free?

Python

Python Data Science Natural Language Processing Machine Learning

Data architect

Dataconomy

JUNE 19, 2025

Distinction between data architect and data engineer While there is some overlap between the roles, a data architect typically focuses on setting high-level data policies. In contrast, data engineers are responsible for implementing these policies through practical database designs and data pipelines.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Setting Up a Machine Learning Pipeline on Google Cloud Platform

Flipboard

JULY 25, 2025

By Cornellius Yudha Wijaya , KDnuggets Technical Content Specialist on July 25, 2025 in Data Engineering Image by Editor | ChatGPT # Introduction Machine learning has become an integral part of many companies, and businesses that dont utilize it risk being left behind.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

DataOps

Dataconomy

JUNE 23, 2025

By integrating Agile methodologies into data practices, DataOps enhances collaboration among cross-functional teams, leading to improved data quality and speed in delivering insights. DataOps is an Agile methodology that focuses on enhancing the efficiency and effectiveness of the data lifecycle through collaborative practices.

DataOps

DataOps Data Pipeline Data Quality Data Science

What Does Python’s slots Actually Do?

Flipboard

JULY 18, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter What Does Python’s __slots__ Actually Do?

Data Science

Data Science Python Natural Language Processing Machine Learning

5 Fun Generative AI Projects for Absolute Beginners

Flipboard

JULY 23, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Fun Generative AI Projects for Absolute Beginners New to generative AI?

Natural Language Processing

Natural Language Processing Data Science Python Machine Learning

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

AWS Machine Learning Blog

JULY 11, 2025

Although we maintain pre-built Amazon QuickSight dashboards for commonly tracked metrics, business users frequently require support for long-tail analytics—the ability to conduct deep dives into specific problems, anomalies, or regional variations not covered by standard reports.

Business Intelligence

Business Intelligence Business Intelligence SQL AWS

Preview of ODSC West 2025: Your Ultimate Track Guide

ODSC - Open Data Science

JULY 4, 2025

Data Visualization & Analytics Explore creative and technical approaches to visualizing complex datasets, designing dashboards, and communicating insights effectively. Ideal for anyone focused on translating data into impactful visuals and stories. Expect deep-dive sessions and practical case studies.

Deep Learning

Deep Learning Deep Learning ML ML

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Build a conversational data assistant, Part 1: Text-to-SQL with Amazon Bedrock Agents

AWS Machine Learning Blog

JULY 11, 2025

Simple business questions can become multi-day ordeals, with analytics teams drowning in routine requests instead of focusing on strategic initiatives. Nicolas Alvarez is a Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team, focusing on building and optimizing recommerce data systems.

SQL

SQL AWS Database Business Intelligence

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

AWS Machine Learning Blog

JULY 3, 2025

This following diagram illustrates the enhanced data extract, transform, and load (ETL) pipeline interaction with Amazon Bedrock. To achieve the desired accuracy in KPI calculations, the data pipeline was refined to achieve consistent and precise performance, which leads to meaningful insights.

AWS

AWS AI AI SQL

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

The blog post explains how the Internal Cloud Analytics team leveraged cloud resources like Code-Engine to improve, refine, and scale the data pipelines. Background One of the Analytics teams tasks is to load data from multiple sources and unify it into a data warehouse.

ETL

ETL Data Pipeline Database Data Warehouse

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Previously, he was a Data & Machine Learning Engineer at AWS, where he worked closely with customers to develop enterprise-scale data infrastructure, including data lakes, analytics dashboards, and ETL pipelines. He specializes in designing, building, and optimizing large-scale data solutions.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

Modern low-code/no-code ETL tools allow data engineers and analysts to build pipelines seamlessly using a drag-and-drop and configure approach with minimal coding. Matillion ETL for Snowflake is an ELT/ETL tool that allows for the ingestion, transformation, and building of analytics for data in the Snowflake AI Data Cloud.

Python

Python ETL AWS Database

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

databricks

JUNE 12, 2025

This standard simplifies pipeline development across batch and streaming workloads. Years of real-world experience have shaped this flexible, Spark-native approach for both batch and streaming pipelines. Declarative pipelines hide the complexity of modern data engineering under a simple, intuitive programming model.

SQL

SQL Data Engineering Data Engineering Data Engineering

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Tools like Python, SQL, Apache Spark, and Snowflake help engineers automate workflows and improve efficiency. Learning these tools is crucial for building scalable data pipelines. Thats where data engineering tools come in!

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Top Technical Skills You Must Have as a Developer in 2025

Flipboard

JUNE 16, 2025

Data Analysis and Transition to Machine Learning: Skills: Python, SQL, Excel, Tableau and Power BI are relevant skills for entry-level data analysis roles. Next Steps: Transition into data engineering (PySpark, ETL) or machine learning (TensorFlow, PyTorch). MySQL, PostgreSQL) and non-relational (e.g.,

Python

Python AWS Machine Learning Machine Learning

On-Prem vs. The Cloud: Key Considerations

phData

FEBRUARY 21, 2025

A data warehouse acts as a single source of truth for an organization’s data, providing a unified view of its operations and enabling data-driven decision-making. A data warehouse enables advanced analytics, reporting, and business intelligence. Today, the cloud has revolutionized the potential for data.

Data Warehouse

Data Warehouse Cloud Data ETL Cloud Computing

Indian BFSI Reinvents Risk Detection with AI-Driven Early Warning Systems

Flipboard

JULY 3, 2025

Encora, a digital product and software engineering provider, believes AI and machine learning are significantly reshaping traditional credit risk models, especially as consumer behaviours shift following COVID-19.

AI

AI AI Database Machine Learning

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

Prior to that, I spent a couple years at First Orion - a smaller data company - helping found & build out a data engineering team as one of the first engineers. We were focused on building data pipelines and models to protect our users from malicious phonecalls. Email: djmcgrath.c@gmail.com

Python

Python AWS SQL ML

The Rise and Fall of Data Science Trends: A 2018–2024 Conference Perspective

ODSC - Open Data Science

MARCH 12, 2025

Data Engineerings SteadyGrowth 20182021: Data engineering was often mentioned but overshadowed by modeling advancements. 20222024: As AI models required larger and cleaner datasets, interest in data pipelines, ETL frameworks, and real-time data processing surged.

Data Science

Data Science Machine Learning Machine Learning Data Engineering

Major Differences: Kafka vs RabbitMQ

Pickl AI

MARCH 13, 2025

Kafka excels in real-time data streaming and scalability. Choose Kafka for big data, analytics, and event-driven applications. IoT applications : Managing large volumes of sensor data from smart devices. Big data pipelines : Moving data between systems for analytics and AI applications.

Apache Kafka

Apache Kafka Big Data Big Data Data Pipeline

Build Your Own Simple Data Pipeline with Python and Docker

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

Webinars

Trending Sources

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

Webinars

Data Pipelines For AI Agents: Building The Backbone Of Intelligent Automation

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Data engineer

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

Innovations in Analytics: Elevating Data Quality with GenAI

8 Ways to Scale your Data Science Workloads

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python

10 Python Math & Statistical Analysis One-Liners

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

10 Surprising Things You Can Do with Python’s collections Module

Why Python Pros Avoid Loops: A Gentle Guide to Vectorized Thinking

A Complete Guide to Matplotlib: From Basics to Advanced Plots

10 Python One-Liners for JSON Parsing and Processing

What’s New in Lakeflow Declarative Pipelines: July 2025

Benefits of Using LiteLLM for Your LLM Apps

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

10 Free Online Courses to Master Python in 2025

Data architect

Setting Up a Machine Learning Pipeline on Google Cloud Platform

DataOps

What Does Python’s __slots__ Actually Do?

5 Fun Generative AI Projects for Absolute Beginners

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

Preview of ODSC West 2025: Your Ultimate Track Guide

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Build a conversational data assistant, Part 1: Text-to-SQL with Amazon Bedrock Agents

Shaping the future: OMRON’s data-driven journey with AWS

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

Serverless High Volume ETL data processing on Code Engine

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Top 10 Python Scripts for use in Matillion for Snowflake

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

Best Data Engineering Tools Every Engineer Should Know

Top Technical Skills You Must Have as a Developer in 2025

On-Prem vs. The Cloud: Key Considerations

Indian BFSI Reinvents Risk Detection with AI-Driven Early Warning Systems

Ask HN: Who wants to be hired? (July 2025)

The Rise and Fall of Data Science Trends: A 2018–2024 Conference Perspective

Major Differences: Kafka vs RabbitMQ

Stay Connected

What Does Python’s slots Actually Do?