Cloud Data, Data Warehouse and Python

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

databricks

JULY 15, 2025

Cross-cloud data governance with Unity Catalog supports accessing S3 data from Azure Databricks. This enables organizations to enforce consistent security, auditing, and data lineage across cloud boundaries. Lakebridge accelerates the migration of legacy data warehouse workloads to Azure Databricks SQL.

Azure

Azure Power BI AI AI

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

How to Learn Machine Learning

APRIL 26, 2025

Data can be generated from databases, sensors, social media platforms, APIs, logs, and web scraping. Data can be in structured (like tables in databases), semi-structured (like XML or JSON), or unstructured (like text, audio, and images) form. Deployment and Monitoring Once a model is built, it is moved to production.

Data Science

Data Science Data Analyst Data Scientist Machine Learning

How to Split Text For Vector Embeddings in Snowflake

phData

NOVEMBER 28, 2024

“ Vector Databases are completely different from your cloud data warehouse.” – You might have heard that statement if you are involved in creating vector embeddings for your RAG-based Gen AI applications. This process is repeated until the entire text is divided into coherent segments. Return the chunks as an ARRAY.

Python

Python Database SQL Machine Learning

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Top 10 Python Scripts for use in Matillion for Snowflake

phData

OCTOBER 28, 2024

One such option is the availability of Python Components in Matillion ETL, which allows us to run Python code inside the Matillion instance. In this blog, we will describe 10 such Python Scripts that can provide a blueprint for using the Python component efficiently in Matillion ETL for Snowflake AI Data Cloud.

Python

Python ETL AWS Database

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake. Intuitive Workflow Design Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code.

AI

AI AI SQL ETL

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

In the contemporary age of Big Data, Data Warehouse Systems and Data Science Analytics Infrastructures have become an essential component for organizations to store, analyze, and make data-driven decisions. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

Cloud Data Science 11

Data Science 101

MARCH 14, 2020

Even with the coronavirus causing mass closures, there are still some big announcements in the cloud data science world. Google introduces Cloud AI Platform Pipelines Google Cloud now provides a way to deploy repeatable machine learning pipelines. Azure Functions now support Python 3.8 So, here is the news.

Cloud Data

Cloud Data Data Science Data Warehouse Azure

Cloud Data Science 11

Data Science 101

MARCH 14, 2020

Even with the coronavirus causing mass closures, there are still some big announcements in the cloud data science world. Google introduces Cloud AI Platform Pipelines Google Cloud now provides a way to deploy repeatable machine learning pipelines. Azure Functions now support Python 3.8 So, here is the news.

Cloud Data

Cloud Data Data Science Data Warehouse Azure

Data Science News from Microsoft Ignite 2019

Data Science 101

NOVEMBER 7, 2019

Microsoft just held one of its largest conferences of the year, and a few major announcements were made which pertain to the cloud data science world. Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Python support has been available for a while. Azure Synapse.

Data Science

Data Science Azure SQL Machine Learning

How to Connect Snowflake to Python

phData

JANUARY 5, 2023

Python is the top programming language used by data engineers in almost every industry. Python has proven proficient in setting up pipelines, maintaining data flows, and transforming data with its simple syntax and proficiency in automation. Why Connect Snowflake to Python?

Python

Python Data Engineering Data Engineering Data Engineering

The Best Data Management Tools For Small Businesses

Smart Data Collective

APRIL 29, 2020

Usually the term refers to the practices, techniques and tools that allow access and delivery through different fields and data structures in an organisation. Data management approaches are varied and may be categorised in the following: Cloud data management. Master data management. Data transformation.

Data Warehouse

Data Warehouse Azure SQL ETL

How Fivetran and dbt Help With ELT

phData

AUGUST 9, 2023

With ELT, we first extract data from source systems, then load the raw data directly into the data warehouse before finally applying transformations natively within the data warehouse. This is unlike the more traditional ETL method, where data is transformed before loading into the data warehouse.

Data Warehouse

Data Warehouse ETL Cloud Data Big Data

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. If you are prompted to choose a kernel, choose Data Science as the image and Python 3 as the kernel, then choose Select.

ML

ML ML AWS Data Warehouse

Best Practices When Developing Matillion Jobs

phData

SEPTEMBER 2, 2024

Matillion is a SaaS-based data integration platform that can be hosted in AWS, Azure, or GCP. It offers a cloud-agnostic data productivity hub called Matillion Data Productivity Cloud. Below is a sample scenario for 3 business units within an organization for the data mart layer of the data warehouse.

ETL

ETL Data Warehouse SQL Database

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

The modern data stack is a combination of various software tools used to collect, process, and store data on a well-integrated cloud-based data platform. It is known to have benefits in handling data due to its robustness, speed, and scalability. Data ingestion/integration services. Data orchestration tools.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Future insights and challenges in data analytics with Aksinia Chumachenko

Dataconomy

SEPTEMBER 27, 2024

This experience helped me to improve my Python skills and get more practical experience working with big data. Another important change is that the new technologies are greatly accelerating the work with data. At Sberbank, I worked as an analyst for major B2B clients.

Analytics

Analytics Analytics Big Data Big Data

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Data integration is essentially the Extract and Load portion of the Extract, Load, and Transform (ELT) process. Data ingestion involves connecting your data sources, including databases, flat files, streaming data, etc, to your data warehouse. Snowflake provides native ways for data ingestion.

Data Warehouse

Data Warehouse Azure AWS Database

Alation 2022.1: Customize Your Data Catalog

Alation

MARCH 1, 2022

Lineage helps them identify the source of bad data to fix the problem fast. Manual lineage will give ARC a fuller picture of how data was created between AWS S3 data lake, Snowflake cloud data warehouse and Tableau (and how it can be fixed). Time is money,” said Leonard Kwok, Senior Data Analyst, ARC.

Data Warehouse

Data Warehouse Data Lakes Cloud Data Database

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Within watsonx.ai, users can take advantage of open-source frameworks like PyTorch, TensorFlow and scikit-learn alongside IBM’s entire machine learning and data science toolkit and its ecosystem tools for code-based and visual data science capabilities. Savings may vary depending on configurations, workloads and vendor.

AI

AI AI Machine Learning Machine Learning

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

Focus Area ETL helps to transform the raw data into a structured format that can be easily available for data scientists to create models and interpret for any data-driven decision. A data pipeline is created with the focus of transferring data from a variety of sources into a data warehouse.

ETL

ETL Data Pipeline ML ML

Picking the Right Notebook for Your Data Science Team

DataRobot Blog

FEBRUARY 21, 2022

Open source notebooks exist because most data science languages are a mix of object-oriented code, complex libraries, and functional programming. Plotting graphics using Python, R, Scala or other languages has always depended on conversion to JPEG format or some other graphical output that does not display when created.

Data Science

Data Science Python Data Scientist Machine Learning

What are Snowflake’s Top Features?

phData

JUNE 3, 2024

Snowflake AI Data Cloud has become a premier cloud data warehousing solution. Maybe you’re just getting started looking into a cloud solution for your organization, or maybe you’ve already got Snowflake and are wondering what features you’re missing out on.

Machine Learning

Machine Learning Machine Learning Database Cloud Data

How to use Snowflake Zero Copy Cloning in your CI/CD Pipelines

phData

MAY 11, 2023

Snowflake has so many features that make it the leader in the Cloud Data Warehouse market. Cloning in Snowflake simply means that the data in the clone is not a copy of the original data but simply points back to the original data. Two of these tools are Terraform and phData’s own Provision tool.

Database

Database SQL DataOps Data Warehouse

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

Data Science Blog

SEPTEMBER 3, 2024

Celonis versucht Machine Learning innerhalb der Plattform aus einer Hand anzubieten und hat auch eigene Python-Bibleotheken dafür entwickelt. Alternativ zu Databricks können auch andere Data Warehouse Datenbankplattformen zur Anwendung kommen, beispielsweise auch snowflake mit dbt. Bisher dreht sich hier viel eher noch um z.

Data Science

Data Science Power BI Azure Data Warehouse

Data Science Current

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

Data Science Career Paths: Analyst, Scientist, Engineer – What’s Right for You?

Webinars

Trending Sources

How to Split Text For Vector Embeddings in Snowflake

Webinars

Top 10 Python Scripts for use in Matillion for Snowflake

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Cloud Data Science 11

Cloud Data Science 11

Data Science News from Microsoft Ignite 2019

How to Connect Snowflake to Python

The Best Data Management Tools For Small Businesses

How Fivetran and dbt Help With ELT

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Best Practices When Developing Matillion Jobs

The Modern Data Stack Explained: What The Future Holds

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Future insights and challenges in data analytics with Aksinia Chumachenko

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Alation 2022.1: Customize Your Data Catalog

Exploring the AI and data capabilities of watsonx

How to Build ETL Data Pipeline in ML

Picking the Right Notebook for Your Data Science Team

What are Snowflake’s Top Features?

How to use Snowflake Zero Copy Cloning in your CI/CD Pipelines

Process Mining – Ist Celonis wirklich so gut? Ein Praxisbericht.

Stay Connected