Data Pipeline, Database and Definition

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

Solution overview Typically, a three-tier software application has a UI interface tier, a middle tier (the backend) for business APIs, and a database tier. Generate, run, and validate the SQL from natural language understanding using LLMs, few-shot examples, and a database schema as a knowledge base.

AWS

AWS SQL Database AI

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

LLM app platforms

Dataconomy

MARCH 20, 2025

Definition and functionality of LLM app platforms These platforms encompass various capabilities specifically tailored for LLM development. Data annotation: Adding relevant metadata to enhance the model’s learning capabilities. KLU.ai: Offers no-code solutions for smooth data source integration.

Data Preparation

Data Preparation Data Pipeline Data Quality Database

Data Ingestion from PostgreSQL to Snowflake using Openflow

phData

JUNE 30, 2025

What we like most about Openflow is that it simplifies data ingestion from multiple sources and accelerates Snowflake customers’ success by eliminating the need for third-party ingestion tools, enabling quick prototyping, and supporting reusable data pipelines. Add Components to get the list of tables required for ingestion.

Database

Database ETL AWS Data Pipeline

Architect a mature generative AI foundation on AWS

Flipboard

MAY 30, 2025

A generative AI foundation can provide primitives such as models, vector databases, and guardrails as a service and higher-level services for defining AI workflows, agents and multi-agents, tools, and also a catalog to encourage reuse. Considerations here are choice of vector database, optimizing indexing pipelines, and retrieval strategies.

AWS

AWS AI AI Database

What Is DataOps? Definition, Principles, and Benefits

Alation

SEPTEMBER 28, 2022

In essence, DataOps is a practice that helps organizations manage and govern data more effectively. However, there is a lot more to know about DataOps, as it has its own definition, principles, benefits, and applications in real-life companies today – which we will cover in this article! Automated testing to ensure data quality.

DataOps

DataOps Data Pipeline Data Quality Analytics

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

Source: IBM Cloud Pak for Data Feature Catalog Users can manage feature definitions and enrich them with metadata, such as tags, transformation logic, or value descriptions. Source: IBM Cloud Pak for Data MLOps teams often struggle when it comes to integrating into CI/CD pipelines. Spark, Flink, etc.)

Machine Learning

Machine Learning Machine Learning ML ML

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

Your data scientists develop models on this component, which stores all parameters, feature definitions, artifacts, and other experiment-related information they care about for every experiment they run. The job reads features, generates predictions, and writes them to a database. Building a Machine Learning platform (Lemonade).

Machine Learning

Machine Learning Machine Learning Data Scientist ML

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly. The following figure shows schema definition and model which reference it.

AWS

AWS Machine Learning Machine Learning ML

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics. Connecting directly to this semantic layer will help give customers access to critical business data in a safe, governed manner. Your data in the cloud.

Tableau

Tableau Analytics Analytics Machine Learning

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. If you want to do the process in a low-code/no-code way, you can follow option C.

ML

ML ML AWS Data Warehouse

Building a Dataset for Triplet Loss with Keras and TensorFlow

Flipboard

FEBRUARY 13, 2023

Project Structure Creating Our Configuration File Creating Our Data Pipeline Preprocessing Faces: Detection and Cropping Summary Citation Information Building a Dataset for Triplet Loss with Keras and TensorFlow In today’s tutorial, we will take the first step toward building our real-time face recognition application. The dataset.py

Data Pipeline

Data Pipeline Deep Learning Deep Learning Python

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

phData

AUGUST 2, 2024

Snowflake AI Data Cloud is one of the most powerful platforms, including storage services supporting complex data. Integrating Snowflake with dbt adds another layer of automation and control to the data pipeline. Snowflake stored procedures and dbt Hooks are essential to modern data engineering and analytics workflows.

Data Pipeline

Data Pipeline Python Database SQL

Advanced Snowflake Features in Coalesce

phData

JULY 4, 2024

This blog will cover creating customized nodes in Coalesce, what new advanced features can already be used as nodes, and how to create them as part of your data pipeline. To create a UDN, we’ll need a node definition that defines how the node should function and templates for how the object will be created and run.

SQL

SQL Data Pipeline Data Engineering Data Engineering

Cataloging MicroStrategy

Alation

FEBRUARY 20, 2020

Alation’s deep integration with tools like MicroStrategy and Tableau provides visibility into the complete data pipeline: from storage through visualization. Many of our customers have been telling us that these two tools in particular form the core of their visual analytics environments.

Data Governance

Data Governance Tableau Hadoop Data Pipeline

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Tableau

OCTOBER 8, 2021

With its LookML modeling language, Looker provides a unique, modern approach to define governed and reusable data models to build a trusted foundation for analytics. Connecting directly to this semantic layer will help give customers access to critical business data in a safe, governed manner. Your data in the cloud.

Tableau

Tableau Analytics Analytics Machine Learning

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

The primary goal of Data Engineering is to transform raw data into a structured and usable format that can be easily accessed, analyzed, and interpreted by Data Scientists, analysts, and other stakeholders. Future of Data Engineering The Data Engineering market will expand from $18.2

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Triplet Loss with Keras and TensorFlow

Flipboard

MARCH 6, 2023

In the previous tutorial of this series, we built the dataset and data pipeline for our Siamese Network based Face Recognition application. Specifically, we looked at an overview of triplet loss and discussed what kind of data samples are required to train our model with the triplet loss.

Deep Learning

Deep Learning Deep Learning Data Pipeline Computer Science

Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow

PyImageSearch

FEBRUARY 5, 2024

Implementing Face Recognition and Verification Given that we want to identify people with id-1021 to id-1024 , we are given 1 image (or a few samples) of each person, which allows us to add the person to our face recognition database. Then, whichever feature has the minimum distance with our test feature is the identity of the test image.

Database

Database Data Pipeline Deep Learning Deep Learning

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

Definitions: Foundation Models, Gen AI, and LLMs Before diving into the practice of productizing LLMs, let’s review the basic definitions of GenAI elements: Foundation Models (FMs) - Large deep learning models that are pre-trained with attention mechanisms on massive datasets. This helps cleanse the data.

Data Pipeline

Data Pipeline ML ML Data Warehouse

Cookiecutter Data Science V2

DrivenData Labs

MAY 21, 2024

Hello from our new, friendly, welcoming, definitely not an AI overlord cookie logo! The second is to provide a directed acyclic graph (DAG) for data pipelining and model building. Teams that primarily access hosted data or assets (e.g., These options include DVC, Pachyderm and Quilt.

Data Science

Data Science Python Data Scientist Data Warehouse

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

phData

OCTOBER 25, 2024

It’s common to have terabytes of data in most data warehouses, data quality monitoring is often challenging and cost-intensive due to dependencies on multiple tools and eventually ignored. This results in poor credibility and data consistency after some time, leading businesses to mistrust the data pipelines and processes.

Data Quality

Data Quality Data Pipeline Data Governance Database

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

SEPTEMBER 24, 2024

An optional CloudFormation stack to deploy a data pipeline to enable a conversation analytics dashboard. Choose an option for allowing unredacted logs for the Lambda function in the data pipeline. This allows you to control which IAM principals are allowed to decrypt the data and view it. Choose Create data source.

AWS

AWS AI AI Analytics

How to Ingest Salesforce Data into Snowflake Using Salesforce Sync Out

phData

SEPTEMBER 15, 2023

To configure Salesforce and Snowflake using the Sync Out connector, follow these steps: Step 1: Create Snowflake Objects To use Sync Out with Snowflake, you need to configure the following Snowflake objects appropriately in your Snowflake account: Database and schema that will be used for the Salesforce data.

Data Warehouse

Data Warehouse Tableau Data Silos Analytics

AI-Powered ETL Pipeline Orchestration: Multi-Agent Systems in the Era of Generative AI

ODSC - Open Data Science

FEBRUARY 19, 2025

Well according to Brij Kishore Pandey, it stands for Extract, Transform, Load and is a fundamental process in data engineering, ensuring data moves efficiently from raw sources to structured storage for analysis. The stepsinclude: Extraction : Data is collected from multiple sources (databases, APIs, flatfiles).

ETL

ETL AI AI Data Warehouse

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date. mp4,webm, etc.), and audio files (.wav,mp3,acc,

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Gen AI 101: Technology Choices (Part 1)

phData

JULY 5, 2024

For enterprises, the value-add of applications built on top of large language models is realized when domain knowledge from internal databases and documents is incorporated to enhance a model’s ability to answer questions, generate content, and any other intended use cases.

AI

AI AI Database AWS

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

It is a process for moving and managing data from various sources to a central data warehouse. This process ensures that data is accurate, consistent, and usable for analysis and reporting. Definition and Explanation of the ETL Process ETL is a data integration method that combines data from multiple sources.

ETL

ETL Data Quality Data Pipeline Data Warehouse

What is the Snowflake Data Cloud and How Much Does it Cost?

phData

NOVEMBER 9, 2023

A cloud data warehouse is designed to combine a concept that every organization knows, namely a data warehouse, and optimizes the components of it, for the cloud. This is why we believe that the traditional definitions of data management will change where the platform will be able to handle each type of data requirement natively.

Data Warehouse

Data Warehouse Data Lakes Clustering Cloud Data

Schema Detection and Evolution in Snowflake

phData

MARCH 1, 2024

The Snowflake account is set up with a demo database and schema to load data. From the homepage: Data > Databases > Select your database/schema and select stages. From the homepage: Data > Databases > Select your database/schema and select stages.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Fine-tune your data lineage tracking with descriptive lineage

IBM Journey to AI blog

JULY 1, 2024

Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to understanding and maintaining a trustworthy system of data pipelines.

ETL

ETL Data Lakes Database Data Pipeline

phData Toolkit December 2023 Update

phData

JANUARY 10, 2024

The data source tool can also directly generate the Data Definition Language (DDL) for these tables as well if you decide not to use dbt! This allows you to better understand the existing structures that are in place and more accurately perform your migration (or generate documentation, everybody’s favorite!).

Data Warehouse

Data Warehouse Data Profiling Data Pipeline Database

phData Toolkit February 2023 Update

phData

MARCH 1, 2023

When customers are looking to perform a migration, one of the first things that needs to occur is an assessment of the level of effort to migrate existing data definition language (DDL) and data markup language (DML). Fixed an issue showing invalid timestamp/precision issues when scanning an Impala database.

SQL

SQL Data Pipeline Data Quality Database

phData Toolkit August 2023 Update

phData

SEPTEMBER 7, 2023

We’ve had many customers performing migrations between these platforms, and as a result, they have a lot of Data Definition Language (DDL) and Data Markup Language (DML) that needs to be translated between SQL dialects. Let’s take a look at some of the more interesting translations.

SQL

SQL Data Profiling Data Pipeline Database

phData Toolkit June 2023 Update

phData

JUNE 26, 2023

As customers are performing platform migrations, they frequently need to translate existing stored procedures, data definition language (DDL), and data markup language (DML) to the target system. SQL Translation Updates SQL Translation is another major component of the Toolkit CLI.

SQL

SQL Data Profiling Data Pipeline Data Governance

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

Data pipeline orchestration. Moving/integrating data in the cloud/data exploration and quality assessment. There are four critical components needed for a successful migration: AI/ML models to automate the discovery and semantics of the data. On-premises business intelligence and databases. Cloud governance.

Data Governance

Data Governance ML ML Cloud Data

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

Having gone public in 2020 with the largest tech IPO in history, Snowflake continues to grow rapidly as organizations move to the cloud for their data warehousing needs. Importing data allows you to ingest a copy of the source data into an in-memory database.

Power BI

Power BI Analytics Analytics Azure

Beginner’s Guide To GCP BigQuery (Part 2)

Mlearning.ai

JULY 10, 2023

Without partitioning, daily data activities will cost your company a fortune and a moment will come where the cost advantage of GCP BigQuery becomes questionable. In prior to creating your first Scheduled Query, I recommend that you confirm with your database administrator that you have the adequate IAM permissions to create one.

SQL

SQL Database Database Administration Data Lakes

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

A modern data stack relies on cloud computing, whereas a legacy data stack stores data on servers instead of in the cloud. Modern data stacks provide access for more data professionals than a legacy data stack. You should look for a data warehouse that is scalable, flexible, and efficient.

Data Warehouse

Data Warehouse ETL Tableau Cloud Data

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API. The data would be interesting to analyze. From Data Engineering to Prompt Engineering Prompt to do data analysis BI report generation/data analysis In BI/data analysis world, people usually need to query data (small/large).

AI

AI AI Data Analysis Data Analysis

Mastering AI Applications: What to Expect from the AI Builders Summit Schedule

ODSC - Open Data Science

JANUARY 3, 2025

Selected Training Sessions for Week 2RAG (Wed Jan 22Thu Jan23) Database Patterns for RAG: Single Collections JP Hwang, Technical Curriculum Developer atWeaviate Scaling RAG systems requires strategic architectural decisions to balance performance, cost, and maintainability.

AI

AI AI ML ML

How to Build a CI/CD MLOps Pipeline [Case Study]

The MLOps Blog

MARCH 15, 2023

Two Data Scientists: Responsible for setting up the ML models training and experimentation pipelines. One Data Engineer: Cloud database integration with our cloud expert. Sourcing the data In our case, the data was provided by our client, which was a product-based organization. Redshift, S3, and so on.

AWS

AWS ETL ML ML

Gen AI 101: Testing and Monitoring (Part 4)

phData

AUGUST 15, 2024

In traditional machine learning , data pipelines feeding into the model have queries written with idempotency in mind, and data validation checks are performed before and after inference to confirm an expected output. Retrieval mechanisms are inherent features of the search engines and vector database data store offerings.

AI

AI AI Data Engineering Data Engineering

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit?—?Part 2 of 3

Mlearning.ai

MARCH 15, 2023

I have checked the AWS S3 bucket and Snowflake tables for a couple of days and the Data pipeline is working as expected. The scope of this article is quite big, we will exercise the core steps of data science, let's get started… Project Layout Here are the high-level steps for this project.

Python

Python AWS Exploratory Data Analysis EDA

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Build Data Pipelines: Comprehensive Step-by-Step Guide

Trending Sources

LLM app platforms

Data Ingestion from PostgreSQL to Snowflake using Openflow

Architect a mature generative AI foundation on AWS

What Is DataOps? Definition, Principles, and Benefits

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

Definite Guide to Building a Machine Learning Platform

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Self-Service Analytics for Google Cloud, now with Looker and Tableau

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Building a Dataset for Triplet Loss with Keras and TensorFlow

How Do You Call Snowflake Stored Procedures Using dbt Hooks?

Advanced Snowflake Features in Coalesce

Cataloging MicroStrategy

Self-Service Analytics for Google Cloud, now with Looker and Tableau

10 Best Data Engineering Books [Beginners to Advanced]

Triplet Loss with Keras and TensorFlow

Evaluating Siamese Network Accuracy (F1-Score, Precision, and Recall) with Keras and TensorFlow

Implementing GenAI in Practice

Cookiecutter Data Science V2

What is Snowflake’s Data Quality Monitoring Feature and How is it Used?

Deploy generative AI agents in your contact center for voice and chat using Amazon Connect, Amazon Lex, and Amazon Bedrock Knowledge Bases

How to Ingest Salesforce Data into Snowflake Using Salesforce Sync Out

AI-Powered ETL Pipeline Orchestration: Multi-Agent Systems in the Era of Generative AI

How to Manage Unstructured Data in AI and Machine Learning Projects

Gen AI 101: Technology Choices (Part 1)

Top ETL Tools: Unveiling the Best Solutions for Data Integration

What is the Snowflake Data Cloud and How Much Does it Cost?

Schema Detection and Evolution in Snowflake

Fine-tune your data lineage tracking with descriptive lineage

phData Toolkit December 2023 Update

phData Toolkit February 2023 Update

phData Toolkit August 2023 Update

phData Toolkit June 2023 Update

The Cloud Connection: How Governance Supports Security

How to Optimize Power BI and Snowflake for Advanced Analytics

Beginner’s Guide To GCP BigQuery (Part 2)

The Modern Data Stack Explained: What The Future Holds

Generative AI in Software Development

Mastering AI Applications: What to Expect from the AI Builders Summit Schedule

How to Build a CI/CD MLOps Pipeline [Case Study]

Gen AI 101: Testing and Monitoring (Part 4)

Build a Stocks Price Prediction App powered by Snowflake, AWS, Python and Streamlit?—?Part 2 of 3

Stay Connected