Blog, Data Lakes and Hadoop - Data Science Current

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

Differentiating Between Data Lakes and Data Warehouses

Smart Data Collective

SEPTEMBER 23, 2020

While there is a lot of discussion about the merits of data warehouses, not enough discussion centers around data lakes. We talked about enterprise data warehouses in the past, so let’s contrast them with data lakes. Both data warehouses and data lakes are used when storing big data.

Data Lakes

Data Lakes Data Warehouse Big Data Big Data

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How to modernize data lakes with a data lakehouse architecture

IBM Journey to AI blog

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lakes

Data Lakes Data Warehouse Data Governance Analytics

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

Architecturally the introduction of Hadoop, a file system designed to store massive amounts of data, radically affected the cost model of data. Organizationally the innovation of self-service analytics, pioneered by Tableau and Qlik, fundamentally transformed the user model for data analysis. Disruptive Trend #1: Hadoop.

Data Lakes

Data Lakes Hadoop Tableau Big Data

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

Unfolding the Details of Hive in Hadoop

Pickl AI

JULY 6, 2023

Here comes the role of Hive in Hadoop. Hive is a powerful data warehousing infrastructure that provides an interface for querying and analyzing large datasets stored in Hadoop. In this blog, we will explore the key aspects of Hive Hadoop. What is Hadoop ? Thus ensuring optimal performance.

Hadoop

Hadoop SQL Big Data Big Data

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Pickl AI

NOVEMBER 15, 2023

Discover the nuanced dissimilarities between Data Lakes and Data Warehouses. Data management in the digital age has become a crucial aspect of businesses, and two prominent concepts in this realm are Data Lakes and Data Warehouses. It acts as a repository for storing all the data.

Data Lakes

Data Lakes Data Warehouse Database ETL

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms. In this blog, we will discuss: What is the Open Table format (OTF)? It can also be integrated into major data platforms like Snowflake.

Data Lakes

Data Lakes Data Warehouse Database Azure

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

Best 8 data version control tools for 2023 (Source: DagsHub ) Introduction With business needs changing constantly and the growing size and structure of datasets, it becomes challenging to efficiently keep track of the changes made to the data, which leads to unfortunate scenarios such as inconsistencies and errors in data.

Machine Learning

Machine Learning Machine Learning Data Lakes Data Science

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

But what most people don’t realize is that behind the scenes, Uber is not just a transportation service; it’s a data and analytics powerhouse. Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This enables them to batch queries based on speed or accuracy.

Data Lakes

Data Lakes Analytics Analytics Clustering

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

This blog was originally written by Keith Smith and updated for 2023 by Nick Goble & Dominick Rocco. You’ve probably heard of the Snowflake Data Cloud , but did you know that Snowflake also offers a revolutionary set of libraries and runtimes called Snowpark? What is Snowflake’s Snowpark?

SQL

SQL Python Data Lakes Machine Learning

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

And where data was available, the ability to access and interpret it proved problematic. Big data can grow too big fast. Left unchecked, data lakes became data swamps. Some data lake implementations required expensive ‘cleansing pumps’ to make them navigable again. Subscribe to Alation's Blog.

Big Data

Big Data Big Data Apache Kafka Data Lakes

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. All phases of the data-information lifecycle.

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

And you should have experience working with big data platforms such as Hadoop or Apache Spark. Additionally, data science requires experience in SQL database coding and an ability to work with unstructured data of various types, such as video, audio, pictures and text.

Data Science

Data Science Analytics Analytics Data Scientist

Characteristics of Big Data: Types & 5 V’s of Big Data

Pickl AI

SEPTEMBER 17, 2024

Summary: This blog delves into the multifaceted world of Big Data, covering its defining characteristics beyond the 5 V’s, essential technologies and tools for management, real-world applications across industries, challenges organisations face, and future trends shaping the landscape.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 18, 2023

Prior joining AWS, as a Data/Solution Architect he implemented many projects in Big Data domain, including several data lakes in Hadoop ecosystem. As a Data Engineer he was involved in applying AI/ML to fraud detection and office automation.

Clustering

Clustering AWS Database ML

Big Data Syllabus: A Comprehensive Overview

Pickl AI

AUGUST 9, 2024

A well-structured syllabus for Big Data encompasses various aspects, including foundational concepts, technologies, data processing techniques, and real-world applications. This blog aims to provide a comprehensive overview of a typical Big Data syllabus, covering essential topics that aspiring data professionals should master.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

3 Major Trends at Strata New York 2017

DataRobot Blog

OCTOBER 3, 2017

This highlights the two companies’ shared vision on self-service data discovery with an emphasis on collaboration and data governance. 2) When data becomes information, many (incremental) use cases surface. Paxata booth visitors encompassed a broad range of roles, all with data responsibility in some shape or form.

Data Lakes

Data Lakes Azure Data Pipeline Hadoop

Shopping for Data

Alation

FEBRUARY 20, 2020

As a cornerstone of your data architecture the EDM is a serious undertaking whether it is enabled by building on existing technologies or by deploying a single tool that includes all of the functions needed to successfully implement one. Here’s how The Eckerson Group breaks it down: Subscribe to Alation's Blog.

Data Warehouse

Data Warehouse Data Lakes Hadoop Data Preparation

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

These tools may have their own versioning system, which can be difficult to integrate with a broader data version control system. For instance, our data lake could contain a variety of relational and non-relational databases, files in different formats, and data stored using different cloud providers. DVC Git LFS neptune.ai

ML

ML ML Data Lakes Machine Learning

Introduction to Apache NiFi and Its Architecture

Pickl AI

JULY 30, 2024

With its user-friendly interface and robust architecture, NiFi simplifies the complexities of data integration, making it an essential component for modern data-driven enterprises. This blog delves into the fundamentals of Apache NiFi, its architecture, and how it can leverage for effective data flow management.

ETL

ETL Data Lakes Big Data Big Data

Data Catalogs for Search & Discovery

Alation

MARCH 29, 2021

Finding that data is often half the battle. This is why the ability to quickly search and discover data across the enterprise is the first step towards data-driven decision making. In this blog, we will discuss how data catalogs accelerate search & discovery. Subscribe to Alation's Blog.

Machine Learning

Machine Learning Machine Learning Data Lakes Hadoop

Understanding Business Intelligence Architecture: Key Components

Pickl AI

JANUARY 28, 2025

Introduction Business Intelligence (BI) architecture is a crucial framework that organizations use to collect, integrate, analyze, and present business data. This architecture serves as a blueprint for BI initiatives, ensuring that data-driven decision-making is efficient and effective.

Business Intelligence

Business Intelligence Business Intelligence ETL Data Lakes

How Fivetran and dbt Help With ELT

phData

AUGUST 9, 2023

If you’ve been watching how Snowflake Data Cloud has been growing and changing over the years, you’ll see that two tools have made very large impacts on the Modern Data Stack: Fivetran and dbt. Data volumes exploded as web, mobile, and IoT took off. ETL systems just couldn’t handle the massive flows of raw data.

ETL

ETL Data Warehouse Cloud Data Big Data

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Accordingly, one of the most demanding roles is that of Azure Data Engineer Jobs that you might be interested in. The following blog will help you know about the Azure Data Engineering Job Description, salary, and certification course. Data Warehousing concepts and knowledge should be strong.

Azure

Azure Data Engineer Data Engineering Data Engineering

Customer Data Culture: The Innovators Have Already Reinvented Themselves

Alation

FEBRUARY 13, 2020

.” Part of GoDaddy’s transformation was to get the right customer data consolidated in one place and make it accessible to every employee for data-driven decision making. This meant a large Hadoop deployment, self-service analytics tools available to every employee with Tableau, and a data catalog from Alation.

Decision Science

Decision Science Analytics Analytics Data Science

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

In this blog, we’re going to answer these questions and more. Walking you through the biggest challenges we have found when migrating our customer’s data from a legacy system to Snowflake. You’re in luck because this blog is for anyone ready to move or thinking about moving to Snowflake who wants to know what’s in store for them.

SQL

SQL Database Data Quality Data Warehouse

Was ist ein Data Lakehouse?

Data Science Blog

MAY 15, 2023

tl;dr Ein Data Lakehouse ist eine moderne Datenarchitektur, die die Vorteile eines Data Lake und eines Data Warehouse kombiniert. Die Definition eines Data Lakehouse Ein Data Lakehouse ist eine moderne Datenspeicher- und -verarbeitungsarchitektur, die die Vorteile von Data Lakes und Data Warehouses vereint.

Data Warehouse

Data Warehouse Data Lakes Azure AWS

Big Data – Das Versprechen wurde eingelöst

Data Science Blog

MARCH 14, 2023

Big Data tauchte als Buzzword meiner Recherche nach erstmals um das Jahr 2011 relevant in den Medien auf. Big Data wurde zum Business-Sprech der darauffolgenden Jahre. In der Parallelwelt der ITler wurde das Tool und Ökosystem Apache Hadoop quasi mit Big Data beinahe synonym gesetzt. Retrieved August 1, 2020.

Big Data

Big Data Big Data Apache Hadoop Data Science

Data Science Current

Data lakes vs. data warehouses: Decoding the data storage debate

Differentiating Between Data Lakes and Data Warehouses

Webinars

Trending Sources

Streaming Machine Learning Without a Data Lake

Webinars

How to modernize data lakes with a data lakehouse architecture

Data Cataloging in the Data Lake: Alation + Kylo

How Rocket Companies modernized their data science solution on AWS

Unfolding the Details of Hive in Hadoop

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Why Open Table Format Architecture is Essential for Modern Data Systems

Best 8 Data Version Control Tools for Machine Learning 2024

Unleashing the power of Presto: The Uber case study

What is Snowpark — and Why Does it Matter? A phData Perspective

Did Big Data Deliver Business Transformation & Improved CX?

Data platform trinity: Competitive or complementary?

Data science vs data analytics: Unpacking the differences

Characteristics of Big Data: Types & 5 V’s of Big Data

Accelerating time-to-insight with MongoDB time series collections and Amazon SageMaker Canvas

Big Data Syllabus: A Comprehensive Overview

3 Major Trends at Strata New York 2017

Shopping for Data

How to Version Control Data in ML for Various Data Sources

Introduction to Apache NiFi and Its Architecture

Data Catalogs for Search & Discovery

Understanding Business Intelligence Architecture: Key Components

How Fivetran and dbt Help With ELT

Azure Data Engineer Jobs

Customer Data Culture: The Innovators Have Already Reinvented Themselves

Build Data Pipelines: Comprehensive Step-by-Step Guide

What are the Biggest Challenges with Migrating to Snowflake?

Was ist ein Data Lakehouse?

Big Data – Das Versprechen wurde eingelöst

Stay Connected