Artificial Intelligence, Data Lakes and Data Models

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Integrate foundation models into your code with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 6, 2024

The rise of large language models (LLMs) and foundation models (FMs) has revolutionized the field of natural language processing (NLP) and artificial intelligence (AI). These powerful models, trained on vast amounts of data, can generate human-like text, answer questions, and even engage in creative writing tasks.

AWS

AWS Python Machine Learning Machine Learning

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

ODSC - Open Data Science

MARCH 30, 2023

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a GPU to a Container Using Azure ML to Train a Serengeti Data Model for Animal Identification In this article, we will cover how you can train a model using Notebooks in Azure Machine Learning Studio.

Azure

Azure ML ML Data Modeling

Using Azure ML to Train a Serengeti Data Model for Animal Identification

ODSC - Open Data Science

MAY 8, 2023

To get the data, you will need to follow the instructions in the article: Create a Data Solution on Azure Synapse Analytics with Snapshot Serengeti — Part 1 — Microsoft Community Hub, where you will load data into Azure Data Lake via Azure Synapse. Lastly, upload the data from Azure Subscription.

Azure

Azure ML ML Data Modeling

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Data fabric’s value to the enterprise

Tableau

MAY 11, 2022

Data fabrics are gaining momentum as the data management design for today’s challenging data ecosystems. At their most basic level, data fabrics leverage artificial intelligence and machine learning to unify and securely manage disparate data sources without migrating them to a centralized location.

Tableau

Tableau Data Warehouse Database Data Analyst

Data fabric’s value to the enterprise

Tableau

MAY 11, 2022

Data fabrics are gaining momentum as the data management design for today’s challenging data ecosystems. At their most basic level, data fabrics leverage artificial intelligence and machine learning to unify and securely manage disparate data sources without migrating them to a centralized location.

Tableau

Tableau Data Warehouse Database Data Analyst

What is a data fabric?

Tableau

APRIL 18, 2022

A data fabric is an emerging data management design that allows companies to seamlessly access, integrate, model, analyze, and provision data. Monitor data sources according to policies you customize to help users know if fresh, quality data is ready for use. Data modeling. Data preparation.

Tableau

Tableau Data Quality Analytics Analytics

What is a data fabric?

Tableau

APRIL 18, 2022

A data fabric is an emerging data management design that allows companies to seamlessly access, integrate, model, analyze, and provision data. Monitor data sources according to policies you customize to help users know if fresh, quality data is ready for use. Data modeling. Data preparation.

Tableau

Tableau Data Quality Analytics Analytics

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Apache HBase was employed to offer real-time key-based access to data. Model training and scoring was performed either from Jupyter notebooks or through jobs scheduled by Apaches Oozie orchestration tool, which was part of the Hadoop implementation. HBase is employed to offer real-time key-based access to data.

Data Science

Data Science AWS Hadoop Data Scientist

Unstructured data management and governance using AWS AI/ML and analytics services

Flipboard

OCTOBER 25, 2023

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Text, images, audio, and videos are common examples of unstructured data. The steps of the workflow are as follows: Integrated AI services extract data from the unstructured data.

AWS

AWS ML ML Analytics

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

5 Recent Data Science and AI Webinars You Need to See

ODSC - Open Data Science

MARCH 23, 2023

Real-time Analytics & Built-in Machine Learning Models with a Single Database Akmal Chaudhri, Senior Technical Evangelist at SingleStore, explores the importance of delivering real-time experiences in today’s big data industry and how data models and algorithms rely on powerful and versatile data infrastructure.

Data Science

Data Science Data Lakes Machine Learning Machine Learning

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Artificial intelligence (AI) adoption is still in its early stages. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. Businesses must feel confident in the predictions and content that large foundation model providers generate.

AI

AI AI Data Warehouse ML

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

This means that individuals can ask companies to erase their personal data from their systems and from the systems of any third parties with whom the data was shared. Krishna Prasad is a Senior Solutions Architect in Strategic Accounts Solutions Architecture team at AWS.

AWS

AWS Machine Learning Machine Learning Database

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

AI is quickly scaling through dozens of industries as companies, non-profits, and governments are discovering the power of artificial intelligence. Cloudera Cloudera is a cloud-based platform that provides businesses with the tools they need to manage and analyze data. So, what are you waiting for?

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

The Top AI Slides from ODSC West 2024

ODSC - Open Data Science

NOVEMBER 19, 2024

ODSC West 2024 showcased a wide range of talks and workshops from leading data science, AI, and machine learning experts. This blog highlights some of the most impactful AI slides from the world’s best data science instructors, focusing on cutting-edge advancements in AI, data modeling, and deployment strategies.

Deep Learning

Deep Learning Deep Learning Data Science AI

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

This article is an excerpt from the book Expert Data Modeling with Power BI, Third Edition by Soheil Bakhshi, a completely updated and revised edition of the bestselling guide to Power BI and data modeling. No-code/low-code experience using a diagram view in the data preparation layer similar to Dataflows.

Power BI

Power BI Data Warehouse ETL Data Preparation

How to Effectively Handle Unstructured Data Using AI

DagsHub

NOVEMBER 11, 2024

Social media conversations, comments, customer reviews, and image data are unstructured in nature and hold valuable insights, many of which are still being uncovered through advanced techniques like Natural Language Processing (NLP) and machine learning. Many find themselves swamped by the volume and complexity of unstructured data.

AI

AI AI Data Lakes Database

Watch Now: The Top West 2024 Recordings

ODSC - Open Data Science

NOVEMBER 18, 2024

Dimensional Data Modeling in the Modern Era Dustin Dorsey |Principal Data Architect |Onix With the emergence of big data, cloud computing, and AI-driven analytics, many wonder if the traditional principles of dimensional modeling still hold value. This session provides a gentle introduction to vector databases.

Deep Learning

Deep Learning Deep Learning Database Data Science

Exploring the Power of Data Warehouse Functionality

Pickl AI

JUNE 11, 2024

Self-Service Analytics User-friendly interfaces and self-service analytics tools empower business users to explore data independently without relying on IT departments. This might involve data validation rules, data cleansing procedures, and ongoing monitoring to maintain data integrity.

Data Warehouse

Data Warehouse ETL Data Mining Data Mining

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

ODSC - Open Data Science

OCTOBER 9, 2024

What should you be looking for?

Apache Kafka

Apache Kafka AI AI Machine Learning

Ask HN: Who is hiring? (July 2025)

Hacker News

JULY 1, 2025

You’ll own and work with everything from distributed queues and data lakes to prompt evaluation and agentic orchestration. Whether it’s about leveraging LLMs to improve customer support, building data lakes on cloud platforms to improve storage or implementing models using sensor data for quality control.

Python

Python AWS ML ML

New SIEM Alternative Offers Excellent Data Security Features

Smart Data Collective

OCTOBER 16, 2022

The advantages can be summed up as follows: Forced normalization and enrichment – In Open XDR, the system ensures that all data are similar or compatible with each other (normalized) before they are stored in a data lake. If the data is incomplete, additional information is sourced and appended (enrichment).

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Data Lakes Big Data

Data Science Current

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Integrate foundation models into your code with Amazon Bedrock

Trending Sources

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

Using Azure ML to Train a Serengeti Data Model for Animal Identification

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Data fabric’s value to the enterprise

Data fabric’s value to the enterprise

What is a data fabric?

What is a data fabric?

How Rocket Companies modernized their data science solution on AWS

Unstructured data management and governance using AWS AI/ML and analytics services

Data science vs data analytics: Unpacking the differences

5 Recent Data Science and AI Webinars You Need to See

How to use foundation models and trusted governance to manage AI workflow risk

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Find Your AI Solutions at the ODSC West AI Expo

The Top AI Slides from ODSC West 2024

Introduction to Power BI Datamarts

How to Effectively Handle Unstructured Data Using AI

Watch Now: The Top West 2024 Recordings

Exploring the Power of Data Warehouse Functionality

Why Software Engineers Should Be Embracing AI: A Guide to Staying Ahead

Ask HN: Who is hiring? (July 2025)

New SIEM Alternative Offers Excellent Data Security Features

Stay Connected