Article, Data Pipeline and SQL - Data Science Current

Dynamic SQL Queries to Transform Data

Analytics Vidhya

JUNE 28, 2022

This article was published as a part of the Data Science Blogathon. “Preponderance data opens doorways to complex and Avant analytics.” ” Introduction to SQL Queries Data is the premium product of the 21st century.

SQL

SQL Data Science Analytics Analytics

Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python

KDnuggets

JUNE 24, 2025

Instead of writing the same cleaning code repeatedly, a well-designed pipeline saves time and ensures consistency across your data science projects. In this article, well build a reusable data cleaning and validation pipeline that handles common data quality issues while providing detailed feedback about what was fixed.

Python

Python Natural Language Processing Data Science Machine Learning

Interacting with Remote Databases – PostgreSQL and DBAPIs

Analytics Vidhya

SEPTEMBER 22, 2022

This article was published as a part of the Data Science Blogathon. Introduction When creating data pipelines, Software Engineers and Data Engineers frequently work with databases using Database Management Systems like PostgreSQL.

Database

Database Data Pipeline Data Engineering Data Engineering

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Automating CSV to PostgreSQL Ingestion with Airflow and Docker

Analytics Vidhya

OCTOBER 3, 2024

Introduction Managing a data pipeline, such as transferring data from CSV to PostgreSQL, is like orchestrating a well-timed process where each step relies on the previous one. Apache Airflow streamlines this process by automating the workflow, making it easy to manage complex data tasks.

Data Pipeline

Data Pipeline Analytics Analytics Database

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

They have structured data such as sales transactions and revenue metrics stored in databases, alongside unstructured data such as customer reviews and marketing reports collected from various channels. Use Amazon Athena SQL queries to provide insights. Use order dates and news article publishing dates as you look for trends.

AWS

AWS AI AI SQL

Real-Time Sentiment Analysis with Kafka and PySpark

Towards AI

FEBRUARY 29, 2024

Real-time data streaming pipelines play a crutial role in achieving this objective. Within this article, we will explore the significance of these pipelines and utilise robust tools such as Apache Kafka and Spark to manage vast streams of data efficiently. Next, we run an SQL query to extract the data.

Apache Kafka

Apache Kafka SQL Clustering Data Pipeline

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

phData

JUNE 14, 2023

which play a crucial role in building end-to-end data pipelines, to be included in your CI/CD pipelines. Each migration SQL script is assigned a unique sequence number to facilitate the correct order of application. Additionally, we need to incorporate Flyway variables into the Flyway configuration file.

Data Pipeline

Data Pipeline Database SQL Data Engineer

How to Translate SQL Scripts Into Matillion Jobs

phData

JULY 12, 2023

Unlike traditional methods that rely on complex SQL queries for orchestration, Matillion Jobs provides a more streamlined approach. By converting SQL scripts into Matillion Jobs , users can take advantage of the platform’s advanced features for job orchestration, scheduling, and sharing. If they are not, the query can be stopped.

SQL

SQL ETL Database Data Pipeline

How to Translate SQL Scripts Into Matillion Jobs

phData

APRIL 21, 2023

Unlike traditional methods that rely on complex SQL queries for orchestration, Matillion Jobs provide a more streamlined approach. By converting SQL scripts into Matillion Jobs , users can take advantage of the platform’s advanced features for job orchestration, scheduling, and sharing. If they are not, the query can be stopped.

SQL

SQL ETL Database Data Pipeline

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

ODSC - Open Data Science

FEBRUARY 17, 2023

Cloud Computing, APIs, and Data Engineering NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Data Engineering Platforms Spark is still the leader for data pipelines but other platforms are gaining ground. Knowing some SQL is also essential.

Data Science

Data Science Deep Learning Deep Learning Natural Language Processing

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

While machine learning frameworks and platforms like PyTorch, TensorFlow, and scikit-learn can perform data exploration well, it’s not their primary intent. There are also plenty of data visualization libraries available that can handle exploration like Plotly, matplotlib, D3, Apache ECharts, Bokeh, etc.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

In this post, you will learn about the 10 best data pipeline tools, their pros, cons, and pricing. A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process.

Data Pipeline

Data Pipeline ETL Data Quality SQL

What Are Snowflake’s Best Features for Data Transformation?

phData

AUGUST 8, 2024

Putting the T for Transformation in ELT (ETL) is essential to any data pipeline. After extracting and loading your data into the Snowflake AI Data Cloud , you may wonder how best to transform it. Luckily, Snowflake answers this question with many features designed to transform your data for all your analytic use cases.

SQL

SQL Data Pipeline Python ETL

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

Computer Science and Computer Engineering Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Big Data As datasets become larger and more complex, knowing how to work with them will be key.

Data Science

Data Science Data Scientist Computer Science Computer Science

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

The global Big Data and Data Engineering Services market, valued at USD 51,761.6 This article explores the key fundamentals of Data Engineering, highlighting its significance and providing a roadmap for professionals seeking to excel in this vital field. What is Data Engineering? million by 2028. from 2025 to 2030.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Introduction In today’s hyper-connected world, you hear the terms “Big Data” and “Data Science” thrown around constantly. They pop up in news articles, job descriptions, and tech discussions. What exactly is Big Data? Database Knowledge: Like SQL for retrieving data.

Big Data

Big Data Big Data Data Science Machine Learning

How to Setup a Project in Snowpark Using a Python IDE

phData

JULY 2, 2024

Snowpark, offered by the Snowflake AI Data Cloud , consists of libraries and runtimes that enable secure deployment and processing of non-SQL code, such as Python, Java, and Scala. Developers can seamlessly build data pipelines, ML models, and data applications with User-Defined Functions and Stored Procedures.

Python

Python SQL Data Pipeline ML

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

To provide you with a comprehensive overview, this article explores the key players in the MLOps and FMOps (or LLMOps) ecosystems, encompassing both open-source and closed-source tools, with a focus on highlighting their key features and contributions. It could help you detect and prevent data pipeline failures, data drift, and anomalies.

Machine Learning

Machine Learning Machine Learning ML ML

ODSC West 2023 Recap in Pictures

ODSC - Open Data Science

DECEMBER 5, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and of course, plenty related to large language models and generative AI. Originally posted on OpenDataScience.com Read more data science articles on OpenDataScience.com , including tutorials and guides from beginner to advanced levels!

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

Intuitive Workflow Design Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code. WHERE d.name = 'Sales'; Matillion is designed as a no/low-code ELT tool, so lets leave the SQL deep dive for another time and focus on making workflows as clean and intuitive as possible!

AI

AI AI SQL ETL

A Primer to Scaling Pandas

ODSC - Open Data Science

AUGUST 23, 2023

That’s a problem when you’re trying to work with that data in pandas because you have to pull the dataset into the memory of your machine, which can be slow, expensive, and lead to fatal out-of-memory issues. Ponder solves this problem by translating your pandas code to SQL that can be understood by your data warehouse.

Data Warehouse

Data Warehouse Data Science Database SQL

Software Engineering Patterns for Machine Learning

The MLOps Blog

SEPTEMBER 7, 2023

Software patterns in data science and ML engineering | Source: Author In this listicle of articles, I will go through all these different types of codebases from a very honest and pragmatic point of view, trying to give advice and tips to produce high-quality ML production code.

Machine Learning

Machine Learning Machine Learning ETL ML

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Increase your productivity in software development with Generative AI As I mentioned in Generative AI use case article, we are seeing AI-assisted developers. I include some reference for this field in that article, but as time goes by, it is necessary to dedicate a particular article to survey this field in-depth.

AI

AI AI Data Analysis Data Analysis

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

This individual is responsible for building and maintaining the infrastructure that stores and processes data; the kinds of data can be diverse, but most commonly it will be structured and unstructured data. They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Introduction to LangChain for Including AI from Large Language Models (LLMs) Inside Data…

Heartbeat

JANUARY 5, 2024

Introduction to LangChain for Including AI from Large Language Models (LLMs) Inside Data Applications and Data Pipelines This article will provide an overview of LangChain, the problems it addresses, its use cases, and some of its limitations. The following article explains each of these in more detail with code.

AI

AI AI Data Pipeline Deep Learning

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

DataSeries

AUGUST 15, 2024

This also consists of the ability to perform root cause analysis on data problems, optimize data pipelines for performance, and enable data integrity and quality. In this article, let’s understand an explanation of how to enhance problem-solving skills as a data engineer.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

phData

OCTOBER 20, 2023

Some of the databases supported by Fivetran are: Snowflake Data Cloud (BETA) MySQL PostgreSQL SAP ERP SQL Server Oracle In this blog, we will review how to pull Data from on-premise Systems using Fivetran to a specific target or destination. HVA also allows the capture of changes directly from various DBMS articles.

Database

Database SQL ETL Data Warehouse

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

databricks

JUNE 12, 2025

Get a Demo DATA + AI SUMMIT Data + AI Summit Happening Now Watch the free livestream of the keynotes! This standard simplifies pipeline development across batch and streaming workloads. Years of real-world experience have shaped this flexible, Spark-native approach for both batch and streaming pipelines.

SQL

SQL Data Engineer Data Engineering Data Engineering

How to Build Machine Learning Systems With a Feature Store

The MLOps Blog

JANUARY 26, 2024

For this, we have to build an entire machine-learning system around our models that manages their lifecycle, feeds properly prepared data into them, and sends their output to downstream systems. An ML system needs to transform the data into features, train models, and make predictions. This can seem daunting.

Machine Learning

Machine Learning Machine Learning ML ML

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

Click here for link to Part 1 of this article Continuing the Beginner’s Guide to GCP BigQuery series; in Part 2, we will take a look at the advantages and use cases of key features in BigQuery. To create a Scheduled Query, the initial step is to ensure your SQL is accurately entered in the Query Editor.

SQL

SQL Database Database Administration Data Lakes

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

As data is the foundation of any machine learning project, it is essential to have a system in place for tracking and managing changes to data over time. However, data versioning control is frequently given little attention, leading to issues such as data inconsistencies and the inability to reproduce results.

ML

ML ML Data Lakes Machine Learning

ETL Process Explained: Essential Steps for Effective Data Management

Pickl AI

OCTOBER 17, 2024

It involves extracting data from various sources, transforming it into a suitable format, and loading it into a target system for analysis and reporting. As organisations increasingly rely on data-driven insights, effective ETL processes ensure data integrity and quality, enabling informed decision-making.

ETL

ETL Data Warehouse Data Quality SQL

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

DagsHub

APRIL 7, 2024

Data scientists and machine learning engineers need to collaborate to make sure that together with the model, they develop robust data pipelines. These pipelines cover the entire lifecycle of an ML project, from data ingestion and preprocessing, to model training, evaluation, and deployment. It is lightweight.

Machine Learning

Machine Learning Machine Learning ML ML

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Pickl AI

OCTOBER 17, 2024

Summary: This article explores the significance of ETL Data in Data Management. It highlights key components of the ETL process, best practices for efficiency, and future trends like AI integration and real-time processing, ensuring organisations can leverage their data effectively for strategic decision-making.

ETL

ETL Data Warehouse Data Quality Data Governance

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

In this article, you’ll discover what a Snowflake data warehouse is, its pros and cons, and how to employ it efficiently. The platform enables quick, flexible, and convenient options for storing, processing, and analyzing data. Data warehousing is a vital constituent of any business intelligence operation.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Furthermore, we’ve developed data encryption and governance solutions for HPCC Systems to help secure data, ensure it is only accessed by appropriate personnel, and to create audit trails to ensure data security SLAs and regulations are met. It truly is an all-in-one data lake solution. Tell me more about ECL.

Data Lakes

Data Lakes Clustering Big Data Big Data

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Though scripted languages such as R and Python are at the top of the list of required skills for a data analyst, Excel is still one of the most important tools to be used. Because they are the most likely to communicate data insights, they’ll also need to know SQL, and visualization tools such as Power BI and Tableau as well.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

sales conversation summaries, insurance coverage, meeting transcripts, contract information) Generate: Generate text content for a specific purpose, such as marketing campaigns, job descriptions, blogs or articles, and email drafting support. Presto engine: Incorporates the latest performance enhancements to the Presto query engine.

AI

AI AI Machine Learning Machine Learning

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

When workers get their hands on the right data, it not only gives them what they need to solve problems, but also prompts them to ask, “What else can I do with data?” ” through a truly data literate organization. What is data democratization?

Data Lakes

Data Lakes AI AI Data Governance

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination. The biggest reason is the ease of use.

Data Warehouse

Data Warehouse Azure AWS Database

Data Quality Framework: What It Is, Components, and Implementation

DagsHub

AUGUST 23, 2024

Image generated with Midjourney Organizations increasingly rely on data to make business decisions, develop strategies, or even make data or machine learning models their key product. As such, the quality of their data can make or break the success of the company. What is a data quality framework?

Data Quality

Data Quality Data Governance Machine Learning Machine Learning

Dynamic SQL Queries to Transform Data

Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python

Trending Sources

Interacting with Remote Databases – PostgreSQL and DBAPIs

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Automating CSV to PostgreSQL Ingestion with Airflow and Docker

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Real-Time Sentiment Analysis with Kafka and PySpark

How to Set up a CICD Pipeline for Snowflake to Automate Data Pipelines

How to Translate SQL Scripts Into Matillion Jobs

How to Translate SQL Scripts Into Matillion Jobs

Top NLP Skills, Frameworks, Platforms, and Languages for 2023

11 Open Source Data Exploration Tools You Need to Know in 2023

Comparing Tools For Data Processing Pipelines

What Are Snowflake’s Best Features for Data Transformation?

40 Must-Know Data Science Skills and Frameworks for 2023

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Discover the Most Important Fundamentals of Data Engineering

Big Data vs. Data Science: Demystifying the Buzzwords

How to Setup a Project in Snowpark Using a Python IDE

MLOps Landscape in 2023: Top Tools and Platforms

ODSC West 2023 Recap in Pictures

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

A Primer to Scaling Pandas

Software Engineering Patterns for Machine Learning

Generative AI in Software Development

How to Shift from Data Science to Data Engineering

Introduction to LangChain for Including AI from Large Language Models (LLMs) Inside Data…

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

Top Big Data Interview Questions for 2025

How to Pull Data From On-prem Systems Using Fivetran’s HVA Connectors

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

How to Build Machine Learning Systems With a Feature Store

Beginner’s Guide To GCP BigQuery (Part 2)

How to Version Control Data in ML for Various Data Sources

ETL Process Explained: Essential Steps for Effective Data Management

7 Best Machine Learning Workflow and Pipeline Orchestration Tools 2024

Maximising Efficiency with ETL Data: Future Trends and Best Practices

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Drowning in Data? A Data Lake May Be Your Lifesaver

What Industries are Hiring for Different Jobs in AI

Exploring the AI and data capabilities of watsonx

Data democratization: How data architecture can drive business decisions and AI initiatives

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

Data Quality Framework: What It Is, Components, and Implementation

Stay Connected