Data Lakes, Data Preparation and SQL

Data Lakes

Data Preparation

SQL

Data Preparation with SQL Cheatsheet

KDnuggets

JUNE 27, 2022

If your raw data is in a SQL-based data lake, why spend the time and money to export the data into a new platform for data prep?

SQL

SQL Data Preparation Data Lakes

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

Join 20,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

The Project Clinic: Assessing Project Health, Planning, and Execution

MORE WEBINARS

Trending Sources

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

Power BI Datamarts provide no-code/low-code datamart capabilities using Azure SQL Database technology in the background. The Power BI Datamarts support sensitivity labels, endorsement, discovery, and Row-Level Security ( RLS ), which help protect and manage the data according to the business requirements and compliance needs.

Power BI

Power BI Data Warehouse ETL Data Preparation

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

The Project Clinic: Assessing Project Health, Planning, and Execution

MORE WEBINARS

How and When to Use Dataflows in Power BI

phData

SEPTEMBER 28, 2023

Dataflows represent a cloud-based technology designed for data preparation and transformation purposes. Dataflows have different connectors to retrieve data, including databases, Excel files, APIs, and other similar sources, along with data manipulations that are performed using Online Power Query Editor.

Power BI

Power BI Data Preparation Machine Learning Machine Learning

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Key Components of Data Engineering Data Ingestion : Gathering data from various sources, such as databases, APIs, files, and streaming platforms, and bringing it into the data infrastructure. Data Processing: Performing computations, aggregations, and other data operations to generate valuable insights from the data.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

AWS Machine Learning Blog

AUGUST 4, 2023

Train a recommendation model in SageMaker Studio using training data that was prepared using SageMaker Data Wrangler. The real-time inference call data is first passed to the SageMaker Data Wrangler container in the inference pipeline, where it is preprocessed and passed to the trained model for product recommendation.

ML ML AWS AI

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

SQL

SQL ML ML Python

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

Define strict data ingress and egress rules to help protect against manipulation and exfiltration using VPCs with AWS Network Firewall policies. With Security Lake, you can get a more complete understanding of your security data across your entire organization.

AWS

AWS AI AI ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Data collection and ingestion The data collection and ingestion layer connects to all upstream data sources and loads the data into the data lake. Therefore, the ingestion components need to be able to manage authentication, data sourcing in pull mode, data preprocessing, and data storage.

AWS

AWS Machine Learning Machine Learning Analytics

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML ML AWS Data Warehouse

What Do You Actually Need from a Data Catalog Tool?

Alation

SEPTEMBER 23, 2021

Data Catalogs for Data Science & Engineering – Data catalogs that are primarily used for data science and engineering are typically used by very experienced data practitioners. It also catalogs datasets and operations that includes data preparation features and functions.

Data Preparation

Data Preparation SQL Data Governance Data Analysis

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Example template for an exploratory notebook | Source: Author How to organize code in Jupyter notebook For exploratory tasks, the code to produce SQL queries, pandas data wrangling, or create plots is not important for readers. in a pandas DataFrame) but in the company’s data warehouse (e.g., documentation. Redshift).

SQL

SQL Database Data Scientist Python

Tackling AI’s data challenges with IBM databases on AWS

IBM Journey to AI blog

MARCH 14, 2024

Whether it’s for ad hoc analytics, data transformation, data sharing, data lake modernization or ML and gen AI, you have the flexibility to choose. Unlock competitive advantages with accelerated data insights through an AI-powered conversational interface, with no SQL expertise required.

AWS

AWS Database AI AI

How Alteryx & Snowflake Accelerates Analytics

phData

FEBRUARY 24, 2023

Alteryx provides organizations with an opportunity to automate access to data, analytics , data science, and process automation all in one, end-to-end platform. Its capabilities can be split into the following topics: automating inputs & outputs, data preparation, data enrichment, and data science.

Analytics

Analytics Analytics Database Python

Getting Started With Snowflake: Best Practices For Launching

phData

DECEMBER 4, 2023

If you answer “yes” to any of these questions, you will need cloud storage, such as Amazon AWS’s S3, Azure Data Lake Storage or GCP’s Google Storage. Copy Into When loading data into Snowflake, the very first and most important rule to follow is: do not load data with SQL inserts!

Clustering

Clustering Database SQL Data Pipeline

Exploring the AI and data capabilities of watsonx

IBM Journey to AI blog

JULY 17, 2023

Visual modeling: Delivers easy-to-use workflows for data scientists to build data preparation and predictive machine learning pipelines that include text analytics, visualizations and a variety of modeling methods.

AI AI Machine Learning Machine Learning

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

See also Thoughtworks’s guide to Evaluating MLOps Platforms End-to-end MLOps platforms End-to-end MLOps platforms provide a unified ecosystem that streamlines the entire ML workflow, from data preparation and model development to deployment and monitoring. Notebook for interactive Python, SQL, and R editors for coding data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

Data Science Current

Data Preparation with SQL Cheatsheet

Data lakes vs. data warehouses: Decoding the data storage debate

Webinars

Trending Sources

Introduction to Power BI Datamarts

Webinars

How and When to Use Dataflows in Power BI

10 Best Data Engineering Books [Beginners to Advanced]

Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

What Do You Actually Need from a Data Catalog Tool?

How to Use Exploratory Notebooks [Best Practices]

Tackling AI’s data challenges with IBM databases on AWS

How Alteryx & Snowflake Accelerates Analytics

Getting Started With Snowflake: Best Practices For Launching

Exploring the AI and data capabilities of watsonx

MLOps Landscape in 2023: Top Tools and Platforms

Stay Connected