Analytics, Blog and Data Lakes - Data Science Current

What Is a Lakebase?

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Database

Database Data Lakes ETL Analytics

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Differentiating Between Data Lakes and Data Warehouses

Smart Data Collective

SEPTEMBER 23, 2020

While there is a lot of discussion about the merits of data warehouses, not enough discussion centers around data lakes. We talked about enterprise data warehouses in the past, so let’s contrast them with data lakes. Both data warehouses and data lakes are used when storing big data.

Data Lakes

Data Lakes Data Warehouse Big Data Big Data

Streaming Machine Learning Without a Data Lake

ODSC - Open Data Science

MAY 31, 2023

Be sure to check out his talk, “ Apache Kafka for Real-Time Machine Learning Without a Data Lake ,” there! The combination of data streaming and machine learning (ML) enables you to build one scalable, reliable, but also simple infrastructure for all machine learning tasks using the Apache Kafka ecosystem.

Data Lakes

Data Lakes Machine Learning Machine Learning Apache Kafka

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog

MAY 20, 2024

It offers full BI-Stack Automation, from source to data warehouse through to frontend. It supports a holistic data model, allowing for rapid prototyping of various models. It also supports a wide range of data warehouses, analytical databases, data lakes, frontends, and pipelines/ETL.

Data Pipeline

Data Pipeline Data Warehouse Azure Data Lakes

Choosing a Data Lake Format: What to Actually Look For

ODSC - Open Data Science

AUGUST 15, 2023

Recently we’ve seen lots of posts about a variety of different file formats for data lakes. There’s Delta Lake, Hudi, Iceberg, and QBeast, to name a few. It can be tough to keep track of all these data lake formats — let alone figure out why (or if!) And I’m curious to see if you’ll agree.

Data Lakes

Data Lakes ETL Data Science Algorithm

Data Lakes for Non-Techies

Dataversity

OCTOBER 11, 2021

The post Data Lakes for Non-Techies appeared first on DATAVERSITY. Moreover, complex usability helped in developing a network of certified (aka expensive and lucrative) consultancy workforce. IT has recently experienced […].

Data Lakes

Data Lakes Data Warehouse Cloud Data Analytics

How to modernize data lakes with a data lakehouse architecture

IBM Journey to AI blog

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lakes

Data Lakes Data Warehouse Data Governance Analytics

How to Ensure Your New Cloud Data Lake Is Secure

Dataversity

MARCH 24, 2021

Enterprises migrating on-prem data environments to the cloud in pursuit of more robust, flexible, and integrated analytics and AI/ML capabilities are fueling a surge in cloud data lake implementations. The post How to Ensure Your New Cloud Data Lake Is Secure appeared first on DATAVERSITY.

Data Lakes

Data Lakes Cloud Data ML ML

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

Flipboard

MAY 14, 2025

Among these, four primary use cases have emerged as especially prominent: intelligent process automation, anomaly detection, analytics, and operational assistance. Different types of data typically require different tools to access them. Cross account calls arent supported at the time of writing this blog.

AWS

AWS AI AI Database

A Bridge Between Data Lakes and Data Warehouses

Dataversity

JANUARY 28, 2021

It has been ten years since Pentaho Chief Technology Officer James Dixon coined the term “data lake.” While data warehouse (DWH) systems have had longer existence and recognition, the data industry has embraced the more […]. The post A Bridge Between Data Lakes and Data Warehouses appeared first on DATAVERSITY.

Data Lakes

Data Lakes Data Warehouse Data Quality Data Governance

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data.

SQL

SQL AWS Data Lakes ML

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Analytics Analytics Data Scientist

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Pickl AI

NOVEMBER 15, 2023

Discover the nuanced dissimilarities between Data Lakes and Data Warehouses. Data management in the digital age has become a crucial aspect of businesses, and two prominent concepts in this realm are Data Lakes and Data Warehouses. It acts as a repository for storing all the data.

Data Lakes

Data Lakes Data Warehouse Database ETL

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Models Data Modeling Data Warehouse

Data Cataloging in the Data Lake: Alation + Kylo

Alation

FEBRUARY 20, 2020

Architecturally the introduction of Hadoop, a file system designed to store massive amounts of data, radically affected the cost model of data. Organizationally the innovation of self-service analytics, pioneered by Tableau and Qlik, fundamentally transformed the user model for data analysis. The Rise of the Data Catalog.

Data Lakes

Data Lakes Hadoop Tableau Big Data

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

Mind the Gap: Start Modernizing Analytics by Reorienting Your Enterprise Analytics Team

Dataversity

SEPTEMBER 5, 2024

… and your data warehouse / data lake / data lakehouse. A few months ago, I talked about how nearly all of our analytics architectures are stuck in the 1990s. Maybe an executive at your company read that article, and now you have a mandate to “modernize analytics.”

Analytics

Analytics Analytics Data Lakes Data Warehouse

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

Principal is conducting enterprise-scale near-real-time analytics to deliver a seamless and hyper-personalized omnichannel customer experience on their mission to make financial security accessible for all. They are processing data across channels, including recorded contact center interactions, emails, chat and other digital channels.

AWS

AWS Analytics Analytics ML

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Alation

FEBRUARY 20, 2020

For many enterprises, a hybrid cloud data lake is no longer a trend, but becoming reality. Due to these needs, hybrid cloud data lakes emerged as a logical middle ground between the two consumption models. Without business context, business users are less likely to use the data lake and insights will be hard to come by.

Data Lakes

Data Lakes Cloud Data AWS Tableau

Maximize the ROI of Your Enterprise Data Lake

Dataversity

OCTOBER 14, 2022

The data being talked about is useful for businesses to draw insights, formulate strategies, and understand trends and customer behavior, among others. […]. The post Maximize the ROI of Your Enterprise Data Lake appeared first on DATAVERSITY.

Data Lakes

Data Lakes Analytics Analytics Artificial Intelligence

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms. In this blog, we will discuss: What is the Open Table format (OTF)? The most used open table formats currently are Apache Iceberg, Delta Lake, and Apache Hudi.

Data Lakes

Data Lakes Data Warehouse Azure Database

Three Ways Data Analytics Will Progress in 2022 and Beyond

Dataversity

JANUARY 17, 2022

Most industries have seen tremendous change due to the pandemic over the past two years, which has brought new and ever-evolving data that businesses need to make sense of. The post Three Ways Data Analytics Will Progress in 2022 and Beyond appeared first on DATAVERSITY.

Analytics

Analytics Analytics Data Lakes Data Warehouse

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

In this blog post, we demonstrate prompt engineering techniques to generate accurate and relevant analysis of tabular data using industry-specific language. This is done by providing large language models (LLMs) in-context sample data with features and labels in the prompt.

SQL

SQL AWS AI AI

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 20, 2023

Data and governance foundations – This function uses a data mesh architecture for setting up and operating the data lake, central feature store, and data governance foundations to enable fine-grained data access. This framework considers multiple personas and services to govern the ML lifecycle at scale.

ML

ML ML AWS Data Lakes

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

But what most people don’t realize is that behind the scenes, Uber is not just a transportation service; it’s a data and analytics powerhouse. Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. Consider the magnitude of Uber’s footprint.

Data Lakes

Data Lakes Analytics Analytics Clustering

What is the Snowflake Data Cloud and How Much Does it Cost?

phData

NOVEMBER 9, 2023

This blog was originally written by Keith Smith and updated for 2024 by Justin Delisi. Snowflake’s Data Cloud has emerged as a leader in cloud data warehousing. What is a Data Lake? A Data Lake is a location to store raw data that is in any format that an organization may produce or collect.

Data Warehouse

Data Warehouse Data Lakes Clustering Cloud Data

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

Managing, storing, and processing data is critical to business efficiency and success. Modern data warehousing technology can handle all data forms. Significant developments in big data, cloud computing, and advanced analytics created the demand for the modern data warehouse.

Data Warehouse

Data Warehouse Data Lakes Big Data Big Data

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

Managing and retrieving the right information can be complex, especially for data analysts working with large data lakes and complex SQL queries. This post highlights how Twilio enabled natural language-driven data exploration of business intelligence (BI) data with RAG and Amazon Bedrock.

SQL

SQL Data Lakes Data Analyst AWS

Securing Data in Transit for Analytics Operations

Dataversity

MAY 28, 2024

Most enterprises today store and process vast amounts of data from various sources within a centralized repository known as a data warehouse or data lake, where they can analyze it with advanced analytics tools to generate critical business insights.

Analytics

Analytics Analytics Data Warehouse Data Lakes

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Journey to AI blog

JUNE 15, 2023

The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI).

Data Warehouse

Data Warehouse Data Lakes Cloud Data Analytics

Munich Re Launches Enterprise-Wide Data-Driven Platform for Analytics

Alation

FEBRUARY 13, 2020

A lot of people in our audience are looking at implementing data lakes or are in the middle of big data lake initiatives. I know in February of 2017 Munich Re launched their own innovative platform as a cornerstone for analytics that involved a big data lake and a data catalog.

Data Lakes

Data Lakes Analytics Analytics Data Engineering

10 Things AWS Can Do for Your SaaS Company

Smart Data Collective

FEBRUARY 20, 2022

Whether it’s data management, analytics, or scalability, AWS can be the top-notch solution for any SaaS company. Data storage databases. Your SaaS company can store and protect any amount of data using Amazon Simple Storage Service (S3), which is ideal for data lakes, cloud-native applications, and mobile apps.

AWS

AWS Cloud Computing Data Lakes Database

Achieve your AI goals with an open data lakehouse approach

IBM Journey to AI blog

OCTOBER 4, 2023

Another IDC study showed that while 2/3 of respondents reported using AI-driven data analytics, most reported that less than half of the data under management is available for this type of analytics. from 2022 to 2026.

Data Lakes

Data Lakes Data Warehouse AI AI

How INRIX accelerates transportation planning with Amazon Bedrock

AWS Machine Learning Blog

JULY 7, 2025

The problem statement sought to harness the combination of Caltrans’ asset, crash, and points-of-interest (POI) data and INRIX’s 50 petabyte (PB) data lake to anticipate high-risk locations and quickly generate empirically validated safety measures to mitigate the potential for crashes.

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

AWS Machine Learning Blog

DECEMBER 16, 2024

Were seeing a remarkable convergence of data, analytics, and generative AI. With the next generation of Amazon SageMaker announced at re:Invent, were introducing an integrated experience to access, govern, and act on all your data by bringing together widely adopted AWS data, analytics, and AI capabilities.

AWS

AWS AI AI Data Warehouse

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Amazon Simple Storage Service (Amazon S3) stores the model artifacts and creates a data lake to host the inference output, document analysis output, and other datasets in CSV format. The model is then trained using a fully managed infrastructure, validated, and published to the Amazon SageMaker Model Registry.

AWS

AWS Data Lakes ML ML

A Look Inside the Modern Analytics Stack

Dataversity

APRIL 1, 2021

In the data-driven world we live in today, the field of analytics has become increasingly important to remain competitive in business. In fact, a study by McKinsey Global Institute shows that data-driven organizations are 23 times more likely to outperform competitors in customer acquisition and nine times […].

Analytics

Analytics Analytics Data Silos Data Lakes

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability. In this post, you will learn how Marubeni is optimizing market decisions by using the broad set of AWS analytics and ML services, to build a robust and cost-effective Power Bid Optimization solution.

AWS

AWS Machine Learning Machine Learning Analytics

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

AWS Machine Learning Blog

JULY 3, 2025

To achieve the desired accuracy in KPI calculations, the data pipeline was refined to achieve consistent and precise performance, which leads to meaningful insights. At this point, it became possible for the calculator agent to forego the Pandas or Spark data processing implementation.

AWS

AWS AI AI SQL

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

AWS Machine Learning Blog

JUNE 5, 2023

In this post, we describe how to query Parquet files with Athena using AWS Lake Formation and use the output Canvas to train a model. Solution overview Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open table and file formats. Create a data lake with Lake Formation.

Machine Learning

Machine Learning Machine Learning AWS Data Lakes

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

He specializes in large language models, cloud infrastructure, and scalable data systems, focusing on building intelligent solutions that enhance automation and data accessibility across Amazons operations. Chaithanya Maisagoni is a Senior Software Development Engineer (AI/ML) in Amazons Worldwide Returns and ReCommerce organization.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

AWS Machine Learning Blog

NOVEMBER 14, 2024

Usage of data is tracked through the data consumers, such as Amazon Athena , Amazon Redshift , or Amazon SageMaker. AWS Lake Formation – AWS Lake Formation helps manage data lakes and integrate them with other AWS analytics services.

ML

ML ML AWS Machine Learning

What Is a Lakebase?

Data lakes vs. data warehouses: Decoding the data storage debate

Trending Sources

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Differentiating Between Data Lakes and Data Warehouses

Streaming Machine Learning Without a Data Lake

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Choosing a Data Lake Format: What to Actually Look For

Data Lakes for Non-Techies

How to modernize data lakes with a data lakehouse architecture

How to Ensure Your New Cloud Data Lake Is Secure

Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

A Bridge Between Data Lakes and Data Warehouses

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Data science vs data analytics: Unpacking the differences

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Data Cataloging in the Data Lake: Alation + Kylo

Shaping the future: OMRON’s data-driven journey with AWS

Mind the Gap: Start Modernizing Analytics by Reorienting Your Enterprise Analytics Team

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Maximize the ROI of Your Enterprise Data Lake

Why Open Table Format Architecture is Essential for Modern Data Systems

Three Ways Data Analytics Will Progress in 2022 and Beyond

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Unleashing the power of Presto: The Uber case study

What is the Snowflake Data Cloud and How Much Does it Cost?

Why companies need to accelerate data warehousing solution modernization

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Securing Data in Transit for Analytics Operations

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Munich Re Launches Enterprise-Wide Data-Driven Platform for Analytics

10 Things AWS Can Do for Your SaaS Company

Achieve your AI goals with an open data lakehouse approach

How INRIX accelerates transportation planning with Amazon Bedrock

AWS re:Invent 2024 Highlights: Top takeaways from Swami Sivasubramanian to help customers manage generative AI at scale

How Northpower used computer vision with AWS to automate safety inspection risk assessments

A Look Inside the Modern Analytics Stack

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

Stay Connected