Data Lakes, Events and SQL - Data Science Current

What Is a Lakebase?

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Database

Database Data Lakes ETL Analytics

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Imperva Cloud WAF protects hundreds of thousands of websites against cyber threats and blocks billions of security events every day. Counters and insights based on security events are calculated daily and used by users from multiple departments. The data is stored in a data lake and retrieved by SQL using Amazon Athena.

SQL

SQL Database AWS Machine Learning

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. Overview of One Lake Fabric features a lake-centric architecture, with a central repository known as OneLake. Now, we can save the data as delta tables to use later for sales analytics.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineering

Sneak peek at Microsoft Fabric price and its promising features

Dataconomy

JUNE 1, 2023

Unified data storage : Fabric’s centralized data lake, Microsoft OneLake, eliminates data silos and provides a unified storage system, simplifying data access and retrieval. OneLake is designed to store a single copy of data in a unified location, leveraging the open-source Apache Parquet format.

Power BI

Power BI Data Lakes Azure Data Silos

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. One way to address this is to implement a data lake: a large and complex database of diverse datasets all stored in their original format.

Data Lakes

Data Lakes Clustering Big Data Big Data

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Alation

FEBRUARY 20, 2020

For many enterprises, a hybrid cloud data lake is no longer a trend, but becoming reality. Due to these needs, hybrid cloud data lakes emerged as a logical middle ground between the two consumption models. earthquake, flood, or fire), where the data collected does not need to be as tightly controlled.

Data Lakes

Data Lakes Cloud Data AWS Tableau

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Amazon AppFlow was used to facilitate the smooth and secure transfer of data from various sources into ODAP. Additionally, Amazon Simple Storage Service (Amazon S3) served as the central data lake, providing a scalable and cost-effective storage solution for the diverse data types collected from different systems.

AWS

AWS Data Governance Data Silos SQL

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Flipboard

DECEMBER 6, 2023

In this post, we discuss a Q&A bot use case that Q4 has implemented, the challenges that numerical and structured datasets presented, and how Q4 concluded that using SQL may be a viable solution. RAG with semantic search – Conventional RAG with semantic search was the last step before moving to SQL generation.

SQL

SQL Database AWS Machine Learning

Simplifying Time Series Analysis for Data Scientists

ODSC - Open Data Science

SEPTEMBER 12, 2023

Although setting up a database to run your analyses may seem like an arduous task, modern open-source time series databases can provide significant benefits to any scientist running time series analysis on a large data set — and with much less effort than you might imagine. Interested in attending an ODSC event?

Data Scientist

Data Scientist Database Data Lakes Data Science

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Recent events including Tropical Cyclone Gabrielle have highlighted the susceptibility of the grid to extreme weather and emphasized the need for climate adaptation with resilient infrastructure. The model is then trained using a fully managed infrastructure, validated, and published to the Amazon SageMaker Model Registry.

AWS

AWS Data Lakes ML ML

How to Optimize the Value of Snowflake

phData

JUNE 11, 2025

Depending on the requirement, it is important to choose between transient and permanent tables, as well as data recovery needs and downtime considerations. We can set the STATEMENT_TIMEOUT_IN_SECONDS parameter to define the maximum time a SQL statement can run before it is canceled.

Clustering

Clustering SQL Database Data Lakes

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

Evolvability — It’s Mostly About Data Contracts

ODSC - Open Data Science

APRIL 25, 2025

One of the main drivers for this problem is that most analytic systems are built within the context of a unified SQL environment. Getting all the data together, in one place, and integrated is generally the main goal of ourwork. Unfortunately in analytic systems, we have been doing little other than tight coupling for decades.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 27, 2023

Configure the following scopes on your connected app: Manage user data via APIs ( api ). Perform ANSI SQL queries on Salesforce Data Cloud data (Data Cloud_query_api ). Manage Data Cloud profile data ( Data Cloud_profile_api ). Drag and drop the file, then choose Edit in SQL.

ML

ML ML AWS SQL

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

AWS Machine Learning Blog

APRIL 18, 2023

With the recently launched Amazon Monitron Kinesis data export v2 feature , your OT team can stream incoming measurement data and inference results from Amazon Monitron via Amazon Kinesis to AWS Simple Storage Service (Amazon S3) to build an Internet of Things (IoT) data lake. The latest version of Firefox or Chrome.

AWS

AWS ML ML Database

Top 5 Fivetran Connectors for Healthcare

phData

APRIL 29, 2024

Recognizing these specific needs, Fivetran has developed a range of connectors, including dedicated applications, databases, files, and events, which can accommodate the diverse formats used by healthcare systems. Addressing these needs may pose challenges that lead to the implementation of custom solutions rather than a uniform approach.

SQL

SQL Data Warehouse Azure Cloud Data

Top Big Data Tools Every Data Professional Should Know

Pickl AI

FEBRUARY 23, 2025

Apache Spark Apache Spark is a unified analytics engine for Big Data processing, with built-in modules for streaming, SQL, Machine Learning , and graph processing. Key Features : Speed : Spark processes data in-memory, making it up to 100 times faster than Hadoop MapReduce in certain applications.

Big Data

Big Data Big Data Apache Hadoop Apache Kafka

Pictures and Highlights from ODSC Europe 2023

ODSC - Open Data Science

JULY 22, 2023

We had bigger sessions on getting started with machine learning or SQL, up to advanced topics in NLP, and how to make deepfakes. Expo Hall ODSC events are more than just data science training and networking events. Top Sessions With sessions both online and in-person in Boston, there was something for everyone.

Apache Kafka

Apache Kafka Machine Learning Machine Learning Data Science

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

And you should have experience working with big data platforms such as Hadoop or Apache Spark. Additionally, data science requires experience in SQL database coding and an ability to work with unstructured data of various types, such as video, audio, pictures and text.

Data Science

Data Science Analytics Analytics Data Scientist

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

The DataRobot AI Platform seamlessly integrates with Azure cloud services, including Azure Machine Learning, Azure Data Lake Storage Gen 2 (ADLS), Azure Synapse Analytics, and Azure SQL database. DATAROBOT LAUNCH EVENT From Vision to Value. For more information, visit [link].

Azure

Azure Machine Learning Machine Learning AI

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Data collection and ingestion The data collection and ingestion layer connects to all upstream data sources and loads the data into the data lake. Amazon Athena to provide developers and business analysts SQL access to the generated data for analysis and troubleshooting.

AWS

AWS Machine Learning Machine Learning Analytics

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Role of Data Engineers in the Data Ecosystem Data Engineers play a crucial role in the data ecosystem by bridging the gap between raw data and actionable insights. They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

dbt Materialization Types and Strategies Explained

phData

NOVEMBER 6, 2023

Example: models: my_project: events: # materialize all models in models/events as tables +materialized: table csvs: # this is redundant, and does not need to be set +materialized: view We can also configure the materialization type inside the dbt SQL file or the yaml file. This is done via a create view statement.

Clustering

Clustering SQL Python Database

What are the Biggest Challenges with Migrating to Snowflake?

phData

FEBRUARY 5, 2024

The tool converts the templated configuration into a set of SQL commands that are executed against the target Snowflake environment. Replicate can interact with a wide variety of databases, data warehouses, and data lakes (on-premise or based in the cloud). It is also a helpful tool for learning a new SQL dialect.

SQL

SQL Database Data Quality Data Warehouse

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

Select the uploaded file and from Actions dropdown and choose the Query with S3 Select option to query the.csv data using SQL if the data was loaded correctly. In this demonstration, let’s assume that you need to remove the data related to a particular customer.

AWS

AWS Machine Learning Machine Learning Database

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

LakeFS LakeFS is an open-source platform that provides data lake versioning and management capabilities. It sits between the data lake and cloud object storage, allowing you to version and control changes to data lakes at scale. Notebook for interactive Python, SQL, and R editors for coding data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

They’ll also work with software engineers to ensure that the data infrastructure is scalable and reliable. These professionals will work with their colleagues to ensure that data is accessible, with proper access. The reason this is an important skill is that ETL is a critical process for data warehousing and business intelligence.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

A typical data pipeline involves the following steps or processes through which the data passes before being consumed by a downstream process, such as an ML model training process. Data Ingestion : Involves raw data collection from origin and storage using architectures such as batch, streaming or event-driven.

Data Pipeline

Data Pipeline ETL SQL Data Quality

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Data warehousing is a vital constituent of any business intelligence operation.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Here’s the structured equivalent of this same data in tabular form: With structured data, you can use query languages like SQL to extract and interpret information. In contrast, such traditional query languages struggle to interpret unstructured data. This text has a lot of information, but it is not structured.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Big Data Syllabus: A Comprehensive Overview

Pickl AI

AUGUST 9, 2024

NoSQL Databases These databases, such as MongoDB, Cassandra, and HBase, are designed to handle unstructured and semi-structured data, providing flexibility and scalability for modern applications. Understanding the differences between SQL and NoSQL databases is crucial for students.

Big Data

Big Data Big Data Big Data Analytics Big Data Analytics

What is Snowflake Horizon?

phData

AUGUST 5, 2024

For instance, differential privacy adds noise to query results as a means of preventing access to Personally Identifiable Information (PII) and running multi-party computations directly on encrypted data. Object Tagging Tags are schema-level objects that allow data stewards to monitor sensitive data for compliance, protection, or discovery.

Data Governance

Data Governance Data Quality Data Lakes ML

External & Directory Tables in Snowflake 101

phData

JULY 10, 2023

Why External Tables are Important Data Ingestion: External tables allow you to easily load data into Snowflake from various external data sources without the need to first stage the data within Snowflake. Data Integration: Snowflake supports seamless integration with other data processing systems and data lakes.

Data Lakes

Data Lakes Azure Database AWS

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Storage Solutions: Secure and scalable storage options like Azure Blob Storage and Azure Data Lake Storage. Key features and benefits of Azure for Data Science include: Scalability: Easily scale resources up or down based on demand, ideal for handling large datasets and complex computations.

Azure

Azure Data Scientist Data Science Machine Learning

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

phData

FEBRUARY 14, 2023

What Are the Best Third-Party Data Ingestion Tools for Snowflake? Fivetran Fivetran is a tool dedicated to replicating applications, databases, events, and files into a high-performance data warehouse, such as Snowflake. To help you make your choice, here are the ones we consider to be the best.

Data Warehouse

Data Warehouse Azure AWS Database

The Cloud Connection: How Governance Supports Security

Alation

APRIL 14, 2022

Data pipeline orchestration. Support for languages and SQL. Moving/integrating data in the cloud/data exploration and quality assessment. Supports the ability to interact with the actual data and perform analysis on it. Pushing data to a data lake and assuming it is ready for use is shortsighted.

Data Governance

Data Governance ML ML Cloud Data

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

The service will consume the features in real time, generate predictions in near real-time , such as in an event processing pipeline, and write the outputs to a prediction queue. I have worked with customers where R and SQL were the first-class languages of their data science community. Data engineers are mostly in charge of it.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

What Does GPT-3 Mean For the Future of MLOps? With David Hershey

The MLOps Blog

JUNE 5, 2023

One of the hardest things about MLOps today is that a lot of data scientists aren’t native software engineers, but it may be possible to lower the bar to software engineering. And so those are more sideshows of the conversations or other complementary pieces, maybe. Thank you for sharing that, David.

ML

ML ML Machine Learning Machine Learning

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

AWS Machine Learning Blog

JANUARY 26, 2024

Plan for rollback and recovery from production security events and service disruptions such as prompt injection, training data poisoning, model denial of service, and model theft early on, and define the mitigations you will use as you define application requirements.

AWS

AWS ML ML AI

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Methods that allow our customer data models to be as dynamic and flexible as the customers they represent. In this guide, we will explore concepts like transitional modeling for customer profiles, the power of event logs for customer behavior, persistent staging for raw customer data, real-time customer data capture, and much more.

Data Modeling

Data Modeling Data Models Apache Kafka Data Lakes

What Is a Lakebase?

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Webinars

Trending Sources

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Webinars

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Sneak peek at Microsoft Fabric price and its promising features

Drowning in Data? A Data Lake May Be Your Lifesaver

Driving Business Value and ROI from a Hybrid Cloud Data Lake

Shaping the future: OMRON’s data-driven journey with AWS

How Q4 Inc. used Amazon Bedrock, RAG, and SQLDatabaseChain to address numerical and structured dataset challenges building their Q&A chatbot

Simplifying Time Series Analysis for Data Scientists

How Northpower used computer vision with AWS to automate safety inspection risk assessments

How to Optimize the Value of Snowflake

How Rocket Companies modernized their data science solution on AWS

Evolvability — It’s Mostly About Data Contracts

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Top 5 Fivetran Connectors for Healthcare

Top Big Data Tools Every Data Professional Should Know

Pictures and Highlights from ODSC Europe 2023

Data science vs data analytics: Unpacking the differences

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

How Marubeni is optimizing market decisions using AWS machine learning and analytics

Discover the Most Important Fundamentals of Data Engineering

dbt Materialization Types and Strategies Explained

What are the Biggest Challenges with Migrating to Snowflake?

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

MLOps Landscape in 2023: Top Tools and Platforms

How to Shift from Data Science to Data Engineering

Comparing Tools For Data Processing Pipelines

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

How to Manage Unstructured Data in AI and Machine Learning Projects

Big Data Syllabus: A Comprehensive Overview

What is Snowflake Horizon?

External & Directory Tables in Snowflake 101

Your Complete Roadmap to Become an Azure Data Scientist

What Are The Best Third-Party Data Ingestion Tools For Snowflake?

The Cloud Connection: How Governance Supports Security

Definite Guide to Building a Machine Learning Platform

What Does GPT-3 Mean For the Future of MLOps? With David Hershey

Architect defense-in-depth security for generative AI applications using the OWASP Top 10 for LLMs

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Stay Connected