Cloud Data, Data Pipeline and Python

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

databricks

JULY 15, 2025

Get a Demo Login Try Databricks Blog / Platform / Article What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads Explore the latest Azure Databricks capabilities designed to help organizations simplify governance, modernize data pipelines, and power AI-native applications on a secure, open platform.

Azure

Azure Power BI ETL AI

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

However, if the tool supposes an option where we can write our custom programming code to implement features that cannot be achieved using the drag-and-drop components, it broadens the horizon of what we can do with our data pipelines. Top 10 Python Scripts for use in Matillion for Snowflake 1. The default value is Python3.

Python

Python ETL AWS Database

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

For those unfamiliar with GIT or GIT practices, please refer Git for Business Users with Matillion DPC What is a Matillion Pipeline? A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake.

AI

AI AI SQL ETL

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineering

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

We also discuss different types of ETL pipelines for ML use cases and provide real-world examples of their use to help data engineers choose the right one. What is an ETL data pipeline in ML? Xoriant It is common to use ETL data pipeline and data pipeline interchangeably.

ETL

ETL Data Pipeline ML ML

How to Connect Snowflake to Python

phData

JANUARY 5, 2023

Python is the top programming language used by data engineers in almost every industry. Python has proven proficient in setting up pipelines, maintaining data flows, and transforming data with its simple syntax and proficiency in automation. Why Connect Snowflake to Python?

Python

Python Data Engineering Data Engineering Data Engineering

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data science bootcamps are intensive short-term educational programs designed to equip individuals with the skills needed to enter or advance in the field of data science. They cover a wide range of topics, ranging from Python, R, and statistics to machine learning and data visualization.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

JuMa is a service of BMW Group’s AI platform for its data analysts, ML engineers, and data scientists that provides a user-friendly workspace with an integrated development environment (IDE). It is powered by Amazon SageMaker Studio and provides JupyterLab for Python and Posit Workbench for R.

ML

ML ML AWS Data Scientist

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Deployment with the AWS CDK The Step Functions state machine and associated infrastructure (including Lambda functions, CodeBuild projects, and Systems Manager parameters) are deployed with the AWS CDK using Python. The AWS CDK code synthesizes a CloudFormation template, which it uses to deploy all infrastructure for the solution.

AWS

AWS ML ML Machine Learning

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development.

ML

ML ML AWS Data Warehouse

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Best practices are a pivotal part of any software development, and data engineering is no exception. This ensures the data pipelines we create are robust, durable, and secure, providing the desired data to the organization effectively and consistently. Below are the best practices.

ETL

ETL Data Warehouse SQL Database

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

AWS Machine Learning Blog

FEBRUARY 23, 2023

SageMaker has developed the distributed data parallel library , which splits data per node and optimizes the communication between the nodes. You can use the SageMaker Python SDK to trigger a job with data parallelism with minimal modifications to the training script. Each node has a copy of the DNN.

AWS

AWS ML ML Machine Learning

How Does Snowpark Work?

phData

FEBRUARY 7, 2024

The Snowflake Data Cloud is a leading cloud data platform that provides various features and services for data storage, processing, and analysis. A new feature that Snowflake offers is called Snowpark, which provides an intuitive library for querying and processing data at scale in Snowflake.

Python

Python ML ML SQL

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

However, if there’s one thing we’ve learned from years of successful cloud data implementations here at phData, it’s the importance of: Defining and implementing processes Building automation, and Performing configuration …even before you create the first user account. You can use whatever works best for your technology.

Clustering

Clustering Database SQL Data Pipeline

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

These tools are used to manage big data, which is defined as data that is too large or complex to be processed by traditional means. How Did the Modern Data Stack Get Started? The rise of cloud computing and cloud data warehousing has catalyzed the growth of the modern data stack.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination.

Data Warehouse

Data Warehouse Azure AWS Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities. Savings may vary depending on configurations, workloads and vendor.

AI

AI AI Machine Learning Machine Learning

Data Science Current

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

Top 10 Python Scripts for use in Matillion for Snowflake

Webinars

Trending Sources

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Webinars

How to Build Effective Data Pipelines in Snowpark

How to Build ETL Data Pipeline in ML

How to Connect Snowflake to Python

A Guide to Choose the Best Data Science Bootcamp

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Best Practices When Developing Matillion Jobs

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

How Does Snowpark Work?

Getting Started With Snowflake: Best Practices For Launching

The Modern Data Stack Explained: What The Future Holds

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Exploring the AI and data capabilities of watsonx

Stay Connected