Clustering, Database and Demo - Data Science Current

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Towards AI

JANUARY 29, 2025

Retrieval Augmented Generation generally consists of Three major steps, I will explain them briefly down below – Information Retrieval The very first step involves retrieving relevant information from a knowledge base, database, or vector database, where we store the embeddings of the data from which we will retrieve information.

Database

Database Clustering Python SQL

Building Multimodal RAG Systems with Vector Databases

ODSC - Open Data Science

MAY 13, 2025

At a recent webinar hosted by Stefan Webb, Developer Advocate and champion of Milvus (an open-source vector database), he walked a global audience through the what, why, and how of building multimodal RAG systems. By mapping content to a high-dimensional space, related pieces cluster together. Heres what you need toknow.

Database

Database Clustering Data Science Artificial Intelligence

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

Additionally, we dive into integrating common vector database solutions available for Amazon Bedrock Knowledge Bases and how these integrations enable advanced metadata filtering and querying capabilities.

Database

Database AWS Natural Language Processing AI

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

AWS Machine Learning Blog

OCTOBER 24, 2024

For this post we’ll use a provisioned Amazon Redshift cluster. Set up the Amazon Redshift cluster We’ve created a CloudFormation template to set up the Amazon Redshift cluster. Implementation steps Load data to the Amazon Redshift cluster Connect to your Amazon Redshift cluster using Query Editor v2.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Cloud Data

Top Gen AI Demos of AI Applications With MLRun

Iguazio

JANUARY 30, 2025

Each of these demos can be adapted to a number of industries and customized to specific needs. You can also watch the complete library of demos here. Output structured data is stored in a database, accessible for reporting or downstream applications. Watch the smart call center analysis app demo.

AI

AI AI Clustering Machine Learning

How to Manage Thousands of Real-Time Models in Production

Iguazio

APRIL 28, 2025

from local or virtual machine to K8s cluster) and the need for bespoke deployments. Iguazio allows the team to go from testing code locally to running at scale on a remote cluster within minutes. This setup happens once per toolset and is stored in a database. It takes about a week and can be fine-tuned over time.

ML

ML ML Clustering Database

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

Agent Creator is a versatile extension to the SnapLogic platform that is compatible with modern databases, APIs, and even legacy mainframe systems, fostering seamless integration across various data environments. The following demo shows Agent Creator in action. Chunker Snap – Segments large texts into manageable pieces.

AI

AI AI Database AWS

Visualizing graph data without a graph database

Cambridge Intelligence

OCTOBER 25, 2023

Visualizing graph data doesn’t necessarily depend on a graph database… Working on a graph visualization project? You might assume that graph databases are the way to go – they have the word “graph” in them, after all. Do I need a graph database? It depends on your project. Unstructured? Under construction?

Database

Database Data Models Data Modeling Algorithm

Build a Search Engine: Setting Up AWS OpenSearch

Flipboard

MAY 5, 2025

In this series, we will set up AWS OpenSearch , which will serve as a vector database for a semantic search application that well develop step by step. Amazon OpenSearch Service is a fully managed solution that simplifies the deployment, operation, and scaling of OpenSearch clusters in the AWS Cloud.

AWS

AWS Clustering Deep Learning Deep Learning

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

The SnapLogic Intelligent Integration Platform (IIP) enables organizations to realize enterprise-wide automation by connecting their entire ecosystem of applications, databases, big data, machines and devices, APIs, and more with pre-built, intelligent connectors called Snaps.

Database

Database AWS ETL SQL

Citus 12: Schema-based sharding for PostgreSQL

Hacker News

JULY 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps? And if you want to see demos of some of this functionality, be sure to join us for the livestream of the Citus 12.0 Updates page. Let’s dive in!

Database

Database SQL Data Modeling Data Models

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. Enter a connection name such as demo and choose your desired Amazon DocumentDB cluster. Choose Add connection.

Machine Learning

Machine Learning Machine Learning AWS ML

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. Enter a stack name, such as Demo-Redshift. This is the maximum allowed number of domains in each supported Region.

ML

ML ML AWS Data Warehouse

Getting started with Amazon Titan Text Embeddings

AWS Machine Learning Blog

JANUARY 31, 2024

Amazon Titan Text Embeddings is a text embeddings model that converts natural language text—consisting of single words, phrases, or even large documents—into numerical representations that can be used to power use cases such as search, personalization, and clustering based on semantic similarity.

Natural Language Processing

Natural Language Processing AWS Machine Learning Machine Learning

How to Split Text For Vector Embeddings in Snowflake

phData

NOVEMBER 28, 2024

“ Vector Databases are completely different from your cloud data warehouse.” – You might have heard that statement if you are involved in creating vector embeddings for your RAG-based Gen AI applications. Enhanced Search and Retrieval Augmented Generation: Vector search systems work by matching queries with embeddings in a database.

Python

Python Database SQL Machine Learning

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

AWS Machine Learning Blog

APRIL 19, 2024

These controllers allow Kubernetes users to provision AWS resources like buckets, databases, or message queues simply by using the Kubernetes API. Prerequisites To follow along, you should have a Kubernetes cluster with the SageMaker ACK controller v1.2.9 Release v1.2.9 Now you also can use them with SageMaker Operators for Kubernetes.

AWS

AWS ML ML Machine Learning

Get Creative with AI Forecasting in Changing Economic Conditions

DataRobot Blog

OCTOBER 4, 2022

In this blog, we’ll review the DataRobot new Time Series clustering feature, which gives you a creative edge to build time series forecasting models by automatically grouping series that are identical to each other and then building models tailored to these groups. You can also connect to Snowflake, Azure, Redshift and many other databases.

Clustering

Clustering AI AI Azure

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

Chris had earned an undergraduate computer science degree from Simon Fraser University and had worked as a database-oriented software engineer. In 2004, Tableau got both an initial series A of venture funding and Tableau’s first EOM contract with the database company Hyperion—that’s when I was hired. Let’s take a look at each. .

Tableau

Tableau ML ML Database

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

When a query is constructed, it passes through a cost-based optimizer, then data is accessed through connectors, cached for performance and analyzed across a series of servers in a cluster. They stood up a file-based data lake alongside their analytical database. Uber has made the Presto query engine connect to real-time databases.

Data Lakes

Data Lakes Analytics Analytics Clustering

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

AWS Machine Learning Blog

MAY 30, 2024

In this post, we describe how CBRE partnered with AWS Prototyping to develop a custom query environment allowing natural language query (NLQ) prompts by using Amazon Bedrock, AWS Lambda , Amazon Relational Database Service (Amazon RDS), and Amazon OpenSearch Service. Embeddings were generated using Amazon Titan.

AWS

AWS SQL Database AI

Faster distributed graph neural network training with GraphStorm v0.4

AWS Machine Learning Blog

FEBRUARY 11, 2025

Although GraphStorm can run efficiently on single instances for small graphs, it truly shines when scaling to enterprise-level graphs in distributed mode using a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances or Amazon SageMaker. Today, AWS AI released GraphStorm v0.4.

AWS

AWS Python ML ML

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

AWS Machine Learning Blog

MARCH 19, 2024

We use Cohere Command and AI21 Labs Jurassic-2 Mid for this demo. DynamoDB table An application running on AWS uses an Amazon Aurora Multi-AZ DB cluster deployment for its database. Enable read-through caching on the Aurora database. Create a second Aurora database and link it to the primary database as a read replica.

Database

Database AWS Python Natural Language Processing

Generate compliant content with Amazon Bedrock and ConstitutionalChain

AWS Machine Learning Blog

APRIL 1, 2025

By default, Amazon Bedrock uses Amazon OpenSearch Serverless as a vector database. Stephen Garth is a Data Scientist at Insagic, where he develops advanced machine learning solutions, including LLM-powered automation tools and deep clustering models for actionable, consumer insights.

AWS

AWS AI AI Data Scientist

Teaching AI to Smell by Using DataRobot

DataRobot

JUNE 10, 2021

The database used for this competition is based on the Perfumery Materials & Performance dataset by Leffingwell & Associates and the Good Scents Company Information system. Upon further reflection of the embeddings, it’s possible to see clusters of particular molecules. Request a demo. See DataRobot in Action.

Clustering

Clustering Machine Learning Machine Learning AI

Forecast Time Series at Scale with Google BigQuery and DataRobot

DataRobot Blog

NOVEMBER 3, 2022

To understand how DataRobot AI Cloud and Big Query can align, let’s explore how DataRobot AI Cloud Time Series capabilities help enterprises with three specific areas: segmented modeling, clustering, and explainability. Enable Granular Forecasts with Clustering. This is where clustering comes in.

Clustering

Clustering Data Scientist Exploratory Data Analysis AI

Introducing the MLOps Management Agent

DataRobot

JUNE 16, 2021

Additionally, we have recently announced a partnership and integration with Snowflake to expand deployment options by bringing models directly into the database. To see a demo or to learn how it can be applied to your current use cases, reach out to your DataRobot account team or request a demo today. Request a Demo.

Azure

Azure Data Science Clustering AWS

Analyzing the history of Tableau innovation

Tableau

DECEMBER 1, 2021

Chris had earned an undergraduate computer science degree from Simon Fraser University and had worked as a database-oriented software engineer. In 2004, Tableau got both an initial series A of venture funding and Tableau’s first OEM contract with the database company Hyperion—that’s when I was hired. Let’s take a look at each. .

Tableau

Tableau ML ML Database

Observability in LLMOps: Different Levels of Scale

The MLOps Blog

AUGUST 15, 2024

We’re working with super-large GPU clusters and are looking at training runs that take weeks or months. Retrieval Augmented Generation (RAG) systems add a vector database and embeddings to the mix, which require dedicated observability tooling. Pretraining is undoubtedly the most expensive activity.

Database

Database Clustering ML ML

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 26, 2024

This architecture combines a general-purpose large language model (LLM) with a customer-specific document database, which is accessed through a semantic search engine. Because RAG uses a semantic search, it can find more relevant material in the database than just a keyword match alone. Choose Next. Choose Next.

AWS

AWS AI AI Machine Learning

The Shift from Models to Compound AI Systems

BAIR

FEBRUARY 17, 2024

We frequently see this with LLM users, where a good LLM creates a compelling but frustratingly unreliable first demo, and engineering teams then go on to systematically raise quality. Systems can be dynamic. Machine learning models are inherently limited because they are trained on static datasets, so their “knowledge” is fixed.

AI

AI AI DataOps Data Pipeline

Open source data visualization options: we compare 5 tools

Cambridge Intelligence

FEBRUARY 20, 2025

Our graph visualization SDKs include performance demos, so you can run layouts of thousands of chart items and monitor the frames per second (FPS) rate for comparison. Format: Open source automatic graph drawing/design tool that uses a simple graph description language (DOT) for nodes, edges, clusters etc. Cytoscape.js

Data Visualization

Data Visualization Algorithm Data Analyst Clustering

Meet the winners of the Research Rovers: AI Research Assistants for NASA Challenge

DrivenData Labs

DECEMBER 10, 2023

or GPT-4 arXiv, OpenAlex, CrossRef, NTRS lgarma Topic clustering and visualization, paper recommendation, saved research collections, keyword extraction GPT-3.5 Currently, published research may be spread across a variety of different publishers, including free and open-source ones like those used in many of this challenge's demos (e.g.

AI

AI AI Natural Language Processing Artificial Intelligence

How to optimize Google Cloud Platform cloud costs with IBM Turbonomic

IBM Journey to AI blog

MAY 1, 2023

Here, you can find information on the actions and the corresponding workload, such as the container cluster, the namespace and the risk posed to the workload (which, in this case, is transaction congestion): Figure 5 In Figure 6 below, you can see how Turbonomic provides the rationale behind taking the action.

Clustering

Clustering Database Analytics Analytics

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 2, 2023

For example, a health insurance company may want their question answering bot to answer questions using the latest information stored in their enterprise document repository or database, so the answers are accurate and reflect their unique business rules. In this demo, we use a Jumpstart Flan T5 XXL model endpoint.

Algorithm

Algorithm Machine Learning Machine Learning Natural Language Processing

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Kubeflow integrates with popular ML frameworks, supports versioning and collaboration, and simplifies the deployment and management of ML pipelines on Kubernetes clusters. Dolt Dolt is an open-source relational database system built on Git. Check out the Kubeflow documentation.

Machine Learning

Machine Learning Machine Learning ML ML

Build a cybersecurity dashboard to fight alert fatigue

Cambridge Intelligence

JULY 26, 2023

Let’s jump ahead to a few days later, when a red alert shows our database server exchanging a huge number of packets with an external entity. Request full access to our KronoGraph SDK, demos and live-coding playground. What other activity on this file server happened immediately before or after the policy violation?

Clustering

Clustering Data Visualization Database

Understanding earthquakes: what map visualizations teach us

Cambridge Intelligence

NOVEMBER 8, 2023

FREE: The ultimate guide to graph visualization Proven strategies for building successful graph visualization applications GET YOUR FREE GUIDE The earthquakes data source The data I used is from the USGS’s National Earthquake Information Center (NEIC), whose extensive databases of seismic information are freely available. Tōhoku earthquake.

Data Visualization

Data Visualization Clustering Database Data Modeling

Which is better, retrieval augmentation (RAG) or fine-tuning? Both.

Snorkel AI

SEPTEMBER 20, 2023

For example, if a data team wants to use an LLM to examine financial documents—something the model may perform poorly on out of the box—the team can fine-tune it on something like the Financial Documents Clustering data set. This information could come from: A vector database such as FAISS or Pinecone. Book a demo today.

Data Science

Data Science Data Scientist Database AI

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

It won’t be a long demo, it’ll be a very quick demo of what you can do and how you can operationalize stuff in Snowflake. And then once they’re done with that, it’s very easy to package up, and you’ll see that in the demo today. The demo is actually very simple.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

It won’t be a long demo, it’ll be a very quick demo of what you can do and how you can operationalize stuff in Snowflake. And then once they’re done with that, it’s very easy to package up, and you’ll see that in the demo today. The demo is actually very simple.

SQL

SQL ML ML Python

Integrating LLMs with Traditional ML: How, Why & Use Cases

Iguazio

APRIL 24, 2024

This adaptability makes them versatile tools for a variety of industries, from legal document analysis to customer care (For a demo of how to fine-tune a OSS LLM, check out the github repo here ). They can provide information, summaries and insights across many fields without the need for external databases in real-time applications.

ML

ML ML Data Science Data Scientist

Dialogue-guided intelligent document processing with foundation models on Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 24, 2023

Finally, we store these vectors in a vector database for similarity search. As an alternative, you can use FAISS , an open-source vector clustering solution for storing vectors. One of the key features is its ability to interface with external sources of information, such as the web, databases, and APIs.

AI

AI AI AWS ML

Generative AI in the Enterprise

O'Reilly Media

NOVEMBER 28, 2023

If we asked whether their companies were using databases or web servers, no doubt 100% of the respondents would have said “yes.” And there are tools for archiving and indexing prompts for reuse, vector databases for retrieving documents that an AI can use to answer a question, and much more. We expect others to follow.

AI

AI AI Data Analysis Data Analysis

12 Standout Deep Learning Talks Coming to ODSC East this May

ODSC - Open Data Science

APRIL 19, 2023

With Dr. Jon Krohn you’ll also get hands-on code demos in Jupyter notebooks and strategic advice for overcoming common pitfalls. Here, Weaviate will be introduced as an open-source vector search database with unique features for serving millions of users worldwide.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Building Multimodal RAG Systems with Vector Databases

Webinars

Trending Sources

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

Webinars

Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Top Gen AI Demos of AI Applications With MLRun

How to Manage Thousands of Real-Time Models in Production

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

Visualizing graph data without a graph database

Build a Search Engine: Setting Up AWS OpenSearch

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Citus 12: Schema-based sharding for PostgreSQL

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Getting started with Amazon Titan Text Embeddings

How to Split Text For Vector Embeddings in Snowflake

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Get Creative with AI Forecasting in Changing Economic Conditions

Analyzing the history of Tableau innovation

Unleashing the power of Presto: The Uber case study

CBRE and AWS perform natural language queries of structured data using Amazon Bedrock

Faster distributed graph neural network training with GraphStorm v0.4

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Generate compliant content with Amazon Bedrock and ConstitutionalChain

Teaching AI to Smell by Using DataRobot

Forecast Time Series at Scale with Google BigQuery and DataRobot

Introducing the MLOps Management Agent

Analyzing the history of Tableau innovation

Observability in LLMOps: Different Levels of Scale

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

The Shift from Models to Compound AI Systems

Open source data visualization options: we compare 5 tools

Meet the winners of the Research Rovers: AI Research Assistants for NASA Challenge

How to optimize Google Cloud Platform cloud costs with IBM Turbonomic

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

MLOps Landscape in 2023: Top Tools and Platforms

Build a cybersecurity dashboard to fight alert fatigue

Understanding earthquakes: what map visualizations teach us

Which is better, retrieval augmentation (RAG) or fine-tuning? Both.

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Integrating LLMs with Traditional ML: How, Why & Use Cases

Dialogue-guided intelligent document processing with foundation models on Amazon SageMaker JumpStart

Generative AI in the Enterprise

12 Standout Deep Learning Talks Coming to ODSC East this May

Stay Connected