Data Science and ETL - Data Science Current

Build ETL Pipelines for Data Science Workflows in About 30 Lines of Python

KDnuggets

JULY 8, 2025

By Bala Priya C , KDnuggets Contributing Editor & Technical Content Specialist on July 8, 2025 in Data Science Image by Author | Ideogram You know that feeling when you have data scattered across different formats and sources, and you need to make sense of it all? Every ETL pipeline follows the same pattern.

ETL

ETL Data Science Python Natural Language Processing

Build Your Own Simple Data Pipeline with Python and Docker

KDnuggets

JULY 17, 2025

By Cornellius Yudha Wijaya , KDnuggets Technical Content Specialist on July 17, 2025 in Data Science Image by Author | Ideogram Data is the asset that drives our work as data professionals. Without proper data, we cannot perform our tasks, and our business will fail to gain a competitive advantage.

Data Pipeline

Data Pipeline Python ETL Natural Language Processing

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Data Science Dojo

OCTOBER 31, 2024

Remote work quickly transitioned from a perk to a necessity, and data science—already digital at heart—was poised for this change. For data scientists, this shift has opened up a global market of remote data science jobs, with top employers now prioritizing skills that allow remote professionals to thrive.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

KDnuggets

JULY 15, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on July 15, 2025 in Data Science Image by Author Delivering the right data at the right time is a primary need for any organization in the data-driven society. But lets be honest: creating a reliable, scalable, and maintainable data pipeline is not an easy task.

Data Pipeline

Data Pipeline Natural Language Processing Data Science SQL

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

databricks

JULY 15, 2025

At the same time, Lakeflow Designer —the new AI-powered visual pipeline builder available in preview later this year—enables non-technical users to build, deploy, and monitor production-grade data pipelines through a no-code interface. Lakebridge accelerates the migration of legacy data warehouse workloads to Azure Databricks SQL.

Azure

Azure Power BI AI AI

Run the Full DeepSeek-R1-0528 Model Locally

KDnuggets

JUNE 9, 2025

Abid Ali Awan ( @1abidaliawan ) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Data Science Dojo

JULY 24, 2025

Learn more about LLMs and their applications in this Data Science Dojo guide. For more on how AI is transforming workflows, see How AI is Transforming Data Science Workflows. Use Case : “Automate ETL processes for financial data and generate audit-ready logs.” The Benefits of Vibe Coding 1.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Unlocking the Power of Data: How Databricks, WashU & Databasin Are Redefining Healthcare Innovation

databricks

JULY 7, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

Data Science

Data Science Artificial Intelligence Business Intelligence Business Intelligence

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

databricks

JULY 24, 2025

Over the past few months, we’ve introduced exciting updates to Lakeflow Jobs (formerly known as Databricks Workflows) to improve data orchestration and optimize workflow performance. Lakeflow Connect in Jobs is now generally available for customers.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data Engineering

Announcing Google’s Gemma 3 on Databricks

databricks

JULY 14, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

Data Science

Data Science Artificial Intelligence Business Intelligence Business Intelligence

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

databricks

JULY 23, 2025

So, if you have jobs as the center of your ETL process, this seamless integration provides a more intuitive and unified experience for managing ingestion. Databricks recently announced Lakeflow Connect in Jobs, which enables you to create ingestion pipelines within Lakeflow Jobs.

Database

Database Data Warehouse Data Engineering Data Engineer

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

databricks

JUNE 11, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

Analytics

Analytics Analytics Data Science AI

Mosaic AI Announcements at Data + AI Summit 2025

databricks

JUNE 11, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

AI

AI AI SQL Data Science

What Is a Lakebase?

databricks

JUNE 11, 2025

Deeply integrated with the lakehouse, Lakebase simplifies operational data workflows. It eliminates fragile ETL pipelines and complex infrastructure, enabling teams to move faster and deliver intelligent applications on a unified data platform In this blog, we propose a new architecture for OLTP databases called a lakebase.

Database

Database Data Lakes ETL Analytics

5 Error Handling Patterns in Python (Beyond Try-Except)

KDnuggets

JUNE 6, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Error Handling Patterns in Python (Beyond Try-Except) Stop letting errors crash your app.

Python

Python Natural Language Processing Data Science Machine Learning

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

The field of data science is now one of the most preferred and lucrative career options available in the area of data because of the increasing dependence on data for decision-making in businesses, which makes the demand for data science hires peak.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

Automate Data Output to SharePoint Excel via Azure Synapse, Power BI and Power Automate

Data Science Dojo

JULY 21, 2025

Coordinating Refresh Timings and Triggers Timing and synchronization are critical to avoid partial or stale data. Here’s how each component should be aligned: Azure Synapse: Scheduled View/ETL Triggers Use Azure Synapse Pipelines with scheduled triggers to refresh your views or underlying datasets.

Power BI

Power BI Azure SQL ETL

What’s New in Lakeflow Declarative Pipelines: July 2025

databricks

JULY 22, 2025

We also introduced Lakeflow Declarative Pipelines’ new IDE for data engineering (shown above), built from the ground up to streamline pipeline development with features like code-DAG pairing, contextual previews, and AI-assisted authoring. Finally, we announced Lakeflow Designer , a no-code experience for building data pipelines.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Introducing Databricks One

databricks

JUNE 12, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Transforming Patient Referrals: Providence Uses Databricks MLflow to Accelerate Automation Across 1,000+ Clinics

databricks

JULY 18, 2025

All extracted referral data must be integrated into Epic, which requires seamless data formatting, validation and secure delivery. We also rely on Databricks for batch ETL, metadata storage and downstream analysis. 160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

Azure

Azure Data Science Artificial Intelligence Business Intelligence

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science AWS Hadoop Data Scientist

5 Ways to Transition Into AI from a Non-Tech Background

Flipboard

JULY 9, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Ways to Transition Into AI from a Non-Tech Background You have a non-tech background?

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

Graceful External Termination: Handling Pod Deletions in Kubernetes Data Ingestion and Streaming…

IBM Data Science in Practice

APRIL 7, 2025

If not handled correctly, this can lead to locks, data issues, and a negative user experience. The need for handling this issue became more evident after we began implementing streaming jobs in our Apache Spark ETL platform. Consistency : The same mechanism works for any kind of ETL pipeline, either batch ingestions or streaming.

Python

Python ETL Data Pipeline Big Data

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

databricks

JUNE 18, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

AI

AI AI Data Science Artificial Intelligence

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

By Santhosh Kumar Neerumalla , Niels Korschinsky & Christian Hoeboer Introduction This blogpost describes how to manage and orchestrate high volume Extract-Transform-Load (ETL) loads using a serverless process based on Code Engine. The source data is unstructured JSON, while the target is a structured, relational database.

ETL

ETL Data Pipeline Database Data Warehouse

How Data Intelligence is Accelerating IT/OT Convergence

databricks

JULY 11, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

Business Intelligence

Business Intelligence Business Intelligence Artificial Intelligence Artificial Intelligence

Data lakehouse

Dataconomy

JUNE 18, 2025

Data ingestion methods Data lakehouses support multiple data ingestion methods, including batch ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes, along with real-time methods such as stream processing. This flexibility allows organizations to seamlessly integrate diverse data flows.

Data Lakes

Data Lakes Data Warehouse Business Intelligence Business Intelligence

Introducing Recursive Common Table Expressions to Databricks

databricks

JULY 21, 2025

160 Spear Street, 15th Floor San Francisco, CA 94105 1-866-330-0121 See Careers at Databricks © Databricks 2025.

SQL

SQL Data Warehouse Data Science Artificial Intelligence

DataOps

Dataconomy

JUNE 23, 2025

Key tools include: ETL/ELT tools: Such as Apache NiFi or Talend, which help in data processing. Data curation tools: Which assist in managing data quality and lifecycle. Leveraging automation: Utilizing automation tools improves efficiency across business intelligence and data science applications.

DataOps

DataOps Data Pipeline Data Quality Data Science

Data integration

Dataconomy

JUNE 18, 2025

Techniques such as data mapping and the creation of mediated schemas help harmonize differing data formats, making integration smoother. Types of data integration methods There are several methods used for data integration, each suited for different scenarios.

Data Warehouse

Data Warehouse Data Silos ETL Big Data

From Chaos to Control: A Cost Maturity Journey with Databricks

databricks

JULY 24, 2025

Clustering

Clustering SQL Azure AWS

Preview of ODSC West 2025: Your Ultimate Track Guide

ODSC - Open Data Science

JULY 4, 2025

Data Visualization & Analytics Explore creative and technical approaches to visualizing complex datasets, designing dashboards, and communicating insights effectively. Ideal for anyone focused on translating data into impactful visuals and stories. Perfect for building the infrastructure behind data-driven solutions.

Deep Learning

Deep Learning Deep Learning ML ML

Summary of DAIS 2025 Announcements Through the Lens of Games

databricks

JULY 15, 2025

Lakebase is a fully managed Postgres database, integrated into your Lakehouse, that will automatically sync your Delta tables without you having to write custom ETL, config IAM or Networking. Lakebase enables game developers to easily serve Lakehouse derived insight to their applications. Learn more here (updated once available).

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Machine Learning at Scale: Why PySpark MLlib Still Wins in 2025

Towards AI

JULY 11, 2025

At scale, that’s survival.

Machine Learning

Machine Learning Machine Learning Clustering Data Lakes

7 Steps to Mastering Vibe Coding

KDnuggets

JULY 8, 2025

As managing editor of KDnuggets & Statology , and contributing editor at Machine Learning Mastery , Matthew aims to make complex data science concepts accessible. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

Building the Intelligent Telecom: Driving Growth and Efficiency with Data and AI – A Joint Perspective from Accenture & Databricks

databricks

JULY 21, 2025

These datasets are highly sought after by third-party marketers, driving telcos to monetize their data assets. Databricks facilitates this data monetization by bridging the gap between geospatial information systems (GIS) and traditional data science/analytics silos, enabling the creation of unified datasets that deliver new business value.

AI

AI AI Data Science Artificial Intelligence

AWS at Databricks Data + AI Summit 2025

databricks

JUNE 4, 2025

AWS

AWS AI AI Data Science

13 Open-Source Tools to Explore the Agentic AI Ecosystem

ODSC - Open Data Science

JULY 18, 2025

As these tools mature, data professionals can apply them to: Automate data processing and analytics. Build intelligent pipelines for ETL and reporting. Construct adaptive systems for research, writing, and decision-making. Adoption considerations include: Navigating the learning curve of multi-agent orchestration.

AI

AI AI Database ETL

The GenAI Strategy Playbook

phData

JUNE 26, 2025

In turn, the same will happen in data engineering. Autonomous agents will re-architect the data lifecycle, from data modelling and infrastructure-as-code to platform migrations, CI/CD, governance, and ETL pipelines.

ML

ML ML Data Engineering Data Engineering

What Are AI Credits and How Can Data Scientists Use Them?

ODSC - Open Data Science

APRIL 23, 2025

In todays fast-moving machine learning and AI landscape, access to top-tier tools and infrastructure is a game-changer for any data science team. At ODSC East 2025 , were proud to partner with leading AI and data companies offering these credits to help data professionals test, build, and scale their work.

Data Scientist

Data Scientist Azure Apache Kafka ML

Stress Testing Supply Chain Networks at Scale on Databricks

databricks

JULY 15, 2025

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Interview with Anu Jekal

Women in Big Data

MARCH 5, 2025

I worked extensively with ETL processes, PostgreSQL, and later, enterprise-scale data systems. Ive always had a logical, data-driven mindset, constantly digging deeper into metrics and questioning assumptions. Q: Tell me more about Data Surge? At Data Surge, we help organizations modernize their data infrastructure.

ML

ML ML Big Data Big Data

How Gardenia Technologies helps customers create ESG disclosure reports 75% faster using agentic generative AI on Amazon Bedrock

AWS Machine Learning Blog

JUNE 11, 2025

Neil Holloway is Head of Data Science at Gardenia Technologies where he is focused on leveraging AI and machine learning to build and enhance software products. Christian Dunn is a Software Engineer based in London building ETL pipelines, web-apps, and other business solutions at Gardenia Technologies.

AWS

AWS SQL Database AI

Build ETL Pipelines for Data Science Workflows in About 30 Lines of Python

Build Your Own Simple Data Pipeline with Python and Docker

Webinars

Trending Sources

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Webinars

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

Run the Full DeepSeek-R1-0528 Model Locally

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Unlocking the Power of Data: How Databricks, WashU & Databasin Are Redefining Healthcare Innovation

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

Announcing Google’s Gemma 3 on Databricks

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

Mosaic AI Announcements at Data + AI Summit 2025

What Is a Lakebase?

5 Error Handling Patterns in Python (Beyond Try-Except)

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Automate Data Output to SharePoint Excel via Azure Synapse, Power BI and Power Automate

What’s New in Lakeflow Declarative Pipelines: July 2025

Introducing Databricks One

Transforming Patient Referrals: Providence Uses Databricks MLflow to Accelerate Automation Across 1,000+ Clinics

How Rocket Companies modernized their data science solution on AWS

5 Ways to Transition Into AI from a Non-Tech Background

Graceful External Termination: Handling Pod Deletions in Kubernetes Data Ingestion and Streaming…

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

Serverless High Volume ETL data processing on Code Engine

How Data Intelligence is Accelerating IT/OT Convergence

Data lakehouse

Introducing Recursive Common Table Expressions to Databricks

DataOps

Data integration

From Chaos to Control: A Cost Maturity Journey with Databricks

Preview of ODSC West 2025: Your Ultimate Track Guide

Summary of DAIS 2025 Announcements Through the Lens of Games

Machine Learning at Scale: Why PySpark MLlib Still Wins in 2025

7 Steps to Mastering Vibe Coding

Building the Intelligent Telecom: Driving Growth and Efficiency with Data and AI – A Joint Perspective from Accenture & Databricks

AWS at Databricks Data + AI Summit 2025

13 Open-Source Tools to Explore the Agentic AI Ecosystem

The GenAI Strategy Playbook

What Are AI Credits and How Can Data Scientists Use Them?

Stress Testing Supply Chain Networks at Scale on Databricks

Interview with Anu Jekal

How Gardenia Technologies helps customers create ESG disclosure reports 75% faster using agentic generative AI on Amazon Bedrock

Stay Connected