Analytics, Blog and ETL - Data Science Current

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. Create dbt models in dbt Cloud.

ETL

ETL Data Warehouse Analytics Analytics

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! REGISTER Ready to get started?

Analytics

Analytics Analytics Data Science AI

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding? You’ll use Python, end of story.

Python

Python Natural Language Processing Data Science Machine Learning

Mosaic AI Announcements at Data + AI Summit 2025

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! For more details, see the Agent Bricks deep dive blog.

AI

AI AI SQL Data Science

What Is a Lakebase?

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! REGISTER Ready to get started?

Database

Database Data Lakes ETL Analytics

Introducing Databricks One

databricks

JUNE 12, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! Join now Ready to get started?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Run the Full DeepSeek-R1-0528 Model Locally

KDnuggets

JUNE 9, 2025

Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid Ali Awan ( @1abidaliawan ) is a certified data scientist professional who loves building machine learning models.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

databricks

JUNE 18, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

AI

AI AI Data Science Artificial Intelligence

Serverless High Volume ETL data processing on Code Engine

IBM Data Science in Practice

JANUARY 13, 2025

By Santhosh Kumar Neerumalla , Niels Korschinsky & Christian Hoeboer Introduction This blogpost describes how to manage and orchestrate high volume Extract-Transform-Load (ETL) loads using a serverless process based on Code Engine. Thus, we use an Extract-Transform-Load (ETL) process to ingest the data.

ETL

ETL Data Pipeline Database Data Warehouse

Enabling Operational Analytics on the Databricks Lakehouse Platform With Census Reverse ETL

databricks

JANUARY 23, 2023

This is a collaborative post from Databricks and Census. We thank Parker Rogers, Data Community Advocate, at Census for his contributions. In this.

ETL

ETL Analytics Analytics

Understanding ETL Tools as a Data-Centric Organization

Smart Data Collective

SEPTEMBER 8, 2021

The ETL process is defined as the movement of data from its source to destination storage (typically a Data Warehouse) for future use in reports and analyzes. Understanding the ETL Process. Before you understand what is ETL tool , you need to understand the ETL Process first. Types of ETL Tools.

ETL

ETL Hadoop Data Warehouse Data Pipeline

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

It also supports a wide range of data warehouses, analytical databases, data lakes, frontends, and pipelines/ETL. Support for Various Data Warehouses and Databases : AnalyticsCreator supports MS SQL Server 2012-2022, Azure SQL Database, Azure Synapse Analytics dedicated, and more. Data Lakes : It supports MS Azure Blob Storage.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Snowflake Architecture & Key Concepts for Data Warehouse

Analytics Vidhya

JUNE 11, 2022

By the end of this blog, you will also be able to understand how Snowflake […]. The post Snowflake Architecture & Key Concepts for Data Warehouse appeared first on Analytics Vidhya.

Data Warehouse

Data Warehouse Data Science Analytics Analytics

5 Error Handling Patterns in Python (Beyond Try-Except)

KDnuggets

JUNE 6, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Error Handling Patterns in Python (Beyond Try-Except) Stop letting errors crash your app.

Python

Python Natural Language Processing Data Science Machine Learning

AWS at Databricks Data + AI Summit 2025

databricks

JUNE 4, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! REGISTER Ready to get started?

AWS

AWS AI AI Data Science

Streamlining ETL data processing at Talent.com with Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 14, 2023

Our pipeline belongs to the general ETL (extract, transform, and load) process family that combines data from multiple sources into a large, central repository. The solution does not require porting the feature extraction code to use PySpark, as required when using AWS Glue as the ETL solution. session.Session().region_name

ETL

ETL AWS ML ML

The power of remote engine execution for ETL/ELT data pipelines

IBM Journey to AI blog

MAY 15, 2024

Unified, governed data can also be put to use for various analytical, operational and decision-making purposes. Two of the more popular methods, extract, transform, load (ETL ) and extract, load, transform (ELT) , are both highly performant and scalable. The remote engine allows ETL/ELT jobs to be designed once and run anywhere.

Data Pipeline

Data Pipeline ETL SQL Database

Learn the Differences Between ETL and ELT

Pickl AI

OCTOBER 6, 2024

Summary: This blog explores the key differences between ETL and ELT, detailing their processes, advantages, and disadvantages. This blog explores the fundamental concepts of ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform), two pivotal methods in modern data architectures. What is ETL?

ETL

ETL Data Warehouse Data Quality Data Lakes

30% Off ODSC East, Fan-Favorite Speakers, Foundation Models for Times Series, and ETL Pipeline…

ODSC - Open Data Science

MARCH 20, 2025

30% Off ODSC East, Fan-Favorite Speakers, Foundation Models for Times Series, and ETL Pipeline Orchestration The ODSC East 2025 Schedule isLIVE! New Podcast Episode: The AI-Powered Analyst: Skills You Need to StayRelevant In this episode of ODSCs Ai X Podcast, we explore how AI is revolutionizing analytics, decision-making, and careerpaths.

ETL

ETL Data Science Machine Learning Machine Learning

How to Achieve Self-Service Data Transformation for AI and Analytics

Dataversity

FEBRUARY 20, 2024

Traditionally, data transformation was relegated to specialized engineering teams employing complex extract, transform, and load (ETL) processes using highly complex tooling and code. […] The post How to Achieve Self-Service Data Transformation for AI and Analytics appeared first on DATAVERSITY.

Analytics

Analytics Analytics ETL AI

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

In the contemporary age of Big Data, Data Warehouse Systems and Data Science Analytics Infrastructures have become an essential component for organizations to store, analyze, and make data-driven decisions. This brings reliability to data ETL (Extract, Transform, Load) processes, query performances, and other critical data operations.

Data Warehouse

Data Warehouse Azure SQL Database

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

However, efficient use of ETL pipelines in ML can help make their life much easier. This article explores the importance of ETL pipelines in machine learning, a hands-on example of building ETL pipelines with a popular tool, and suggests the best ways for data engineers to enhance and sustain their pipelines.

ETL

ETL Data Pipeline ML ML

Snowflake ETL Face-Off: Alteryx Designer vs. Matillion ETL

phData

MARCH 14, 2024

In the data analytics processes, choosing the right tools is crucial for ensuring efficiency and scalability. Two popular players in this area are Alteryx Designer and Matillion ETL , both offering strong solutions for handling data workflows with Snowflake Data Cloud integration. Today we will focus on Snowflake as our cloud product.

ETL

ETL SQL Data Warehouse Data Pipeline

Eventual (YC W22) Is Hiring a Developer Relations Manager for Daft (SF)

Hacker News

JULY 18, 2024

ABOUT EVENTUAL Eventual is a data platform that helps data scientists and engineers build data applications across ETL, analytics and ML/AI. OUR PRODUCT IS OPEN-SOURCE AND USED AT ENTERPRISE SCALE Our distributed data engine Daft [link] is open-sourced and runs on 800k CPU cores daily.

ML

ML ML Python ETL

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning Blog

NOVEMBER 20, 2024

She has experience across analytics, big data, ETL, cloud operations, and cloud infrastructure management. He has experience across analytics, big data, and ETL. He has experience across analytics, big data, and ETL. Akchhaya Sharma is a Sr. Data Engineer at Amazon Ads.

Database

Database AWS SQL ETL

How Reverse ETL Powers Modern Customer Marketing: Concrete Examples

Dataversity

JANUARY 27, 2023

Up until recently, feedback forms and […] The post How Reverse ETL Powers Modern Customer Marketing: Concrete Examples appeared first on DATAVERSITY. If you’re part of a customer marketing team, you know that most people would say “not very often.” This is precisely the plight of the average customer marketer.

ETL

ETL Data Warehouse Analytics Analytics

How to Plan a Threat Hunt: Using Log Analytics to Manage Data in Depth

Dataversity

JUNE 21, 2021

The post How to Plan a Threat Hunt: Using Log Analytics to Manage Data in Depth appeared first on DATAVERSITY. Only 46% of security operations leaders are satisfied with their team’s ability to detect threats, and 82% of decision-makers report that their responses to threats […].

Analytics

Analytics Analytics ETL Database

IBM watsonx Platform: Compliance obligations to controls mapping

IBM Journey to AI blog

OCTOBER 30, 2024

IBM watsonx.data facilitates scalable analytics and AI endeavors by accommodating data from diverse sources, eliminating the need for migration or cataloging through open formats. This approach enables centralized access and sharing while minimizing extract, transform and load (ETL) processes and data duplication.

Machine Learning

Machine Learning Machine Learning ETL AI

End-to-End model training and deployment with Amazon SageMaker Unified Studio

Flipboard

JULY 3, 2025

To address these challenges, AWS has expanded Amazon SageMaker with a comprehensive set of data, analytics, and generative AI capabilities. Data engineers can create and manage extract, transform, and load (ETL) pipelines directly within Unified Studio using Visual ETL.

ML

ML ML AWS Data Engineering

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data. SageMaker Unied Studio is an integrated development environment (IDE) for data, analytics, and AI. As AI and analytics use cases converge, transform how data teams work together with SageMaker Unified Studio.

SQL

SQL AWS Data Lakes AI

5 strategies for data security and governance in data warehousing: ensuring data protection and compliance

Data Science Dojo

SEPTEMBER 6, 2023

This proactive approach helps maintain optimal system performance, ensuring users execute analytical queries efficiently and deliver insights without delay. This process replaces sensitive data with realistic but fictional data, ensuring that privacy is maintained while still allowing data to be used for development, testing, or analytics.

Data Warehouse

Data Warehouse Data Governance Data Quality ETL

Effective strategies for gathering requirements in your data project

Dataconomy

DECEMBER 17, 2024

This blog post explores effective strategies for gathering requirements in your data project. Example: For a project to optimize supply chain operations, the scope might include creating dashboards for inventory tracking but exclude advanced predictive analytics in the first phase. Key questions to ask: What data sources are required?

Data Quality

Data Quality Power BI Data Engineering Data Engineering

Navigating the World of Data Engineering: A Beginners Guide.

Towards AI

MARCH 21, 2023

If you ever wonder how predictions and forecasts are made based on the raw data collected, stored, and processed in different formats by website feedback, customer surveys, and media analytics, this blog is for you. To learn more about visualizations, you can refer to one of our many blogs on data visualization for a glance.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Summary: Choosing the right ETL tool is crucial for seamless data integration. At the heart of this process lie ETL Tools—Extract, Transform, Load—a trio that extracts data, tweaks it, and loads it into a destination. Choosing the right ETL tool is crucial for smooth data management. What is ETL?

ETL

ETL Data Quality Data Pipeline Data Warehouse

Evaluate large language models for your machine translation tasks on AWS

AWS Machine Learning Blog

JANUARY 7, 2025

This blog post with accompanying code presents a solution to experiment with real-time machine translation using foundation models (FMs) available in Amazon Bedrock. This involves extract, transform, and load (ETL) pipelines able to parse the XML structure, handle encoding issues, and add metadata.

AWS

AWS Python AI AI

A Look Inside the Modern Analytics Stack

Dataversity

APRIL 1, 2021

In the data-driven world we live in today, the field of analytics has become increasingly important to remain competitive in business. The post A Look Inside the Modern Analytics Stack appeared first on DATAVERSITY.

Analytics

Analytics Analytics Data Silos Data Lakes

How Cloud Data Platforms improve Shopfloor Management

Data Science Blog

FEBRUARY 4, 2023

The machine sensor data can be monitored directly in real time via respective data pipelines (real-time stream analytics) or brought into an overall picture of aggregated key figures (reporting). The post How Cloud Data Platforms improve Shopfloor Management appeared first on Data Science Blog. Then get in touch with me!

Cloud Data

Cloud Data Data Science Business Intelligence Business Intelligence

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

These sources are often related but use different naming conventions, which will prolong cleansing, slowing down the data processing and analytics cycle. Transform raw insurance data into CSV format acceptable to Neptune Bulk Loader , using an AWS Glue extract, transform, and load (ETL) job. This will open the ML transforms page.

AWS

AWS ML ML ETL

How to Unlock Real-Time Analytics with Snowflake?

phData

MAY 3, 2024

Leveraging real-time analytics to make informed decisions is the golden standard for virtually every business that collects data. If you have the Snowflake Data Cloud (or are considering migrating to Snowflake ), you’re a blog away from taking a step closer to real-time analytics. Real-time analytics has real-time benefits.

Apache Kafka

Apache Kafka Analytics Analytics ETL

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

The solution: IBM databases on AWS To solve for these challenges, IBM’s portfolio of SaaS database solutions on Amazon Web Services (AWS), enables enterprises to scale applications, analytics and AI across the hybrid cloud landscape. It enables secure data sharing for analytics and AI across your ecosystem.

AWS

AWS Database ETL AI

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Thats why we use advanced technology and data analytics to streamline every step of the homeownership experience, from application to closing. Data refinement: Raw data is refined into consumable layers (raw, processed, conformed, and analytical) using a combination of AWS Glue extract, transform, and load (ETL) jobs and EMR jobs.

Data Science

Data Science AWS Hadoop Data Scientist

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Previously, he was a Data & Machine Learning Engineer at AWS, where he worked closely with customers to develop enterprise-scale data infrastructure, including data lakes, analytics dashboards, and ETL pipelines.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

This post is co-written with Suhyoung Kim, General Manager at KakaoGames Data Analytics Lab. To solve this problem, we build an extract, transform, and load (ETL) pipeline that can be run automatically and repeatedly for training and inference dataset creation. But there is still an engineering challenge.

AWS

AWS ML ML ETL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

Trending Sources

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Mosaic AI Announcements at Data + AI Summit 2025

What Is a Lakebase?

Introducing Databricks One

Run the Full DeepSeek-R1-0528 Model Locally

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

Serverless High Volume ETL data processing on Code Engine

Enabling Operational Analytics on the Databricks Lakehouse Platform With Census Reverse ETL

Understanding ETL Tools as a Data-Centric Organization

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Snowflake Architecture & Key Concepts for Data Warehouse

5 Error Handling Patterns in Python (Beyond Try-Except)

AWS at Databricks Data + AI Summit 2025

Streamlining ETL data processing at Talent.com with Amazon SageMaker

The power of remote engine execution for ETL/ELT data pipelines

Top 20 Data Warehouse Interview Questions You Must Know in 2025

Learn the Differences Between ETL and ELT

30% Off ODSC East, Fan-Favorite Speakers, Foundation Models for Times Series, and ETL Pipeline…

How to Achieve Self-Service Data Transformation for AI and Analytics

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

How to Build ETL Data Pipeline in ML

Snowflake ETL Face-Off: Alteryx Designer vs. Matillion ETL

Eventual (YC W22) Is Hiring a Developer Relations Manager for Daft (SF)

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

How Reverse ETL Powers Modern Customer Marketing: Concrete Examples

How to Plan a Threat Hunt: Using Log Analytics to Manage Data in Depth

IBM watsonx Platform: Compliance obligations to controls mapping

End-to-End model training and deployment with Amazon SageMaker Unified Studio

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

5 strategies for data security and governance in data warehousing: ensuring data protection and compliance

Effective strategies for gathering requirements in your data project

Navigating the World of Data Engineering: A Beginners Guide.

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Evaluate large language models for your machine translation tasks on AWS

A Look Inside the Modern Analytics Stack

How Cloud Data Platforms improve Shopfloor Management

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

How to Unlock Real-Time Analytics with Snowflake?

Tackling AI’s data challenges with IBM databases on AWS

How Rocket Companies modernized their data science solution on AWS

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Stay Connected