Books, Data Engineering and Data Pipeline

Books

Data Engineering

Data Pipeline

Building a Data Pipeline with PySpark and AWS

Analytics Vidhya

AUGUST 3, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Introduction Apache Spark is a framework used in cluster computing environments. The post Building a Data Pipeline with PySpark and AWS appeared first on Analytics Vidhya.

Data Pipeline

Data Pipeline AWS Clustering Data Science

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Aspiring and experienced Data Engineers alike can benefit from a curated list of books covering essential concepts and practical techniques. These 10 Best Data Engineering Books for beginners encompass a range of topics, from foundational principles to advanced data processing methods. Lakhs to ₹ 20.0

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Join 17,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Trending Sources

Beyond The Data: Eugenia Pais, Sr. Data Engineer

phData

JULY 22, 2024

Welcome to Beyond the Data, a series that investigates the people behind the talent of phData. Data Engineer at phData. Data Engineer? As a Senior Data Engineer, I wear many hats. On the technical side, I clean and organize data, design storage solutions, and build transformation pipelines.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Build trust in banking with data lineage

IBM Journey to AI blog

APRIL 20, 2023

Before a bank can start the process of certifying a risk model, they first need to understand what data is being used and how it changes as it moves from a database to a model.

Database

Database Data Engineering Data Engineer Data Engineering

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Outside of work, he enjoys playing lawn tennis and reading books. Jeff Newburn is a Senior Software Engineering Manager leading the Data Engineering team at Logikcull – A Reveal Technology. He oversees the company’s data initiatives, including data warehouses, visualizations, analytics, and machine learning.

AWS

AWS Machine Learning Machine Learning ML

Data Analytics in the Age of AI, When to Use RAG, Examples of Data Visualization with D3 and Vega…

ODSC - Open Data Science

APRIL 4, 2024

Find out how to weave data reliability and quality checks into the execution of your data pipelines and more. More Speakers and Sessions Announced for the 2024 Data Engineering Summit Ranging from experimentation platforms to enhanced ETL models and more, here are some more sessions coming to the 2024 Data Engineering Summit.

Data Visualization

Data Visualization Analytics Analytics Big Data Analytics

Five benefits of a data catalog

IBM Journey to AI blog

DECEMBER 16, 2022

You have a specific book in mind, but you have no idea where to find it. You enter the title of the book into the computer and the library’s digital inventory system tells you the exact section and aisle where the book is located.

Data Quality

Data Quality Data Governance Data Scientist Data Wrangling

Scale knowledge management use cases with generative AI

IBM Journey to AI blog

JULY 27, 2023

It provides the combination of data lake flexibility and data warehouse performance to help to scale AI. A data lakehouse is a fit-for-purpose data store. A data lakehouse with multiple query engines and storage can allow engineers to share data in open formats.

AI AI Data Scientist Data Quality

How Fifth Third Bank Implements a Data Mesh with Alation and Snowflake

Alation

JUNE 14, 2023

We didn’t have access to hundreds of data engineers out in the marketplace,” Lavorini points out. Data Pipeline Capabilities This team’s scope is massive because the data pipelines are huge and there are many different capabilities embedded in them. Register (and book a meeting with our team).

Data Pipeline

Data Pipeline ETL Data Warehouse SQL

What is the Snowflake Data Cloud and How Much Does it Cost?

phData

NOVEMBER 9, 2023

If you’d like a more personalized look into the potential of Snowflake for your business, definitely book one of our free Snowflake migration assessment sessions. These casual, informative sessions offer straightforward answers and honest advice for moving your data to Snowflake.

Data Warehouse

Data Warehouse Data Lakes Clustering Cloud Data

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

Alation

MAY 31, 2023

American Family Insurance: Governance by Design – Not as an Afterthought Who: Anil Kumar Kunden , Information Standards, Governance and Quality Specialist at AmFam Group When: Wednesday, June 7, at 2:45 PM Why attend: Learn how to automate and accelerate data pipeline creation and maintenance with data governance, AKA metadata normalization.

Data Governance

Data Governance DataOps Data Pipeline Business Intelligence

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

Source data formats can only be Parquer, JSON, or Delimited Text (CSV, TSV, etc.). Streamsets Data Collector StreamSets Data Collector Engine is an easy-to-use data pipeline engine for streaming, CDC, and batch ingestion from any source to any destination. The biggest reason is the ease of use.

Data Warehouse

Data Warehouse Azure AWS Database

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

Data engineers, data scientists and other data professional leaders have been racing to implement gen AI into their engineering efforts. To read more about LLMOps and MLOps, checkout the O’Reilly book “Implementing MLOps in the Enterprise” , authored by Iguazio ’s CTO and co-founder Yaron Haviv and by Noah Gift.

ML ML Data Scientist AI

Data Governance for Dummies: Your Questions, Answered

Alation

FEBRUARY 17, 2023

In this blog, I’ll address some of the questions we did not have time to answer live, pulling from both Dr. Reichental’s book as well as my own experience as a data governance leader for 30+ years. Can you have proper data management without establishing a formal data governance program? Communication is essential.

Data Governance

Data Governance Data Quality Data Analyst Data Pipeline

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Activity Schema Modeling: Capturing the Customer Journey in Action Now that we’ve got our Lego blocks of customer data, let’s talk about another game-changing approach that’s shaking up the world of customer data modeling: Activity Schema Modeling. Your customer data game will never be the same.

Data Models

Data Models Data Modeling Apache Kafka Data Lakes

Data Science Current

Building a Data Pipeline with PySpark and AWS

10 Best Data Engineering Books [Beginners to Advanced]

Webinars

Trending Sources

Beyond The Data: Eugenia Pais, Sr. Data Engineer

Webinars

Build trust in banking with data lineage

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Data Analytics in the Age of AI, When to Use RAG, Examples of Data Visualization with D3 and Vega…

Five benefits of a data catalog

Scale knowledge management use cases with generative AI

How Fifth Third Bank Implements a Data Mesh with Alation and Snowflake

What is the Snowflake Data Cloud and How Much Does it Cost?

Secrets from Data Governance Leaders: DGIQ West 2023 (June 5 – 9)

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

LLMOps vs. MLOps: Understanding the Differences

Data Governance for Dummies: Your Questions, Answered

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected