AI, Clustering and Document - Data Science Current

Evaluating Long-Context Question & Answer Systems

Eugene Yan

JUNE 21, 2025

eugeneyan Start Here Writing Speaking Prototyping About Evaluating Long-Context Question & Answer Systems [ llm eval survey ] · 28 min read While evaluating Q&A systems is straightforward with short paragraphs, complexity increases as documents grow larger. Here’s a 35% discount code.

Clustering

Clustering Natural Language Processing AI AI

Introducing Databricks One

databricks

JUNE 12, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! Join now Ready to get started?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

#47 Building a NotebookLM Clone, Time Series Clustering, Instruction Tuning, and More!

Towards AI

OCTOBER 31, 2024

Author(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts! We’re also excited to share updates on Building LLMs for Production, now available on our own platform: Towards AI Academy. Louis-François Bouchard, Towards AI Co-founder & Head of Community 🎉 Great news!

Clustering

Clustering AI AI Machine Learning

How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod

AWS Machine Learning Blog

MAY 15, 2025

The banking industry has long struggled with the inefficiencies associated with repetitive processes such as information extraction, document review, and auditing. This is where Apoidea Group , a leading AI-focused FinTech independent software vendor (ISV) based in Hong Kong, has made a significant impact.

ML

ML AWS ML Machine Learning

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Flipboard

JULY 2, 2025

Generative AI has revolutionized customer interactions across industries by offering personalized, intuitive experiences powered by unprecedented access to information. For businesses, RAG offers a powerful way to use internal knowledge by connecting company documentation to a generative AI model.

AWS

AWS Clustering K-nearest Neighbors Algorithm

Hierarchical Clustering in Machine Learning: An In-Depth Guide

Pickl AI

JUNE 5, 2025

Summary: Hierarchical clustering in machine learning organizes data into nested clusters without predefining cluster numbers. Unlike partition-based methods such as K-means, hierarchical clustering builds a nested tree-like structure called a dendrogram that reveals the multi-level relationships between data points.

Clustering

Clustering Machine Learning Machine Learning Exploratory Data Analysis

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

Flipboard

DECEMBER 3, 2024

Syngenta and AWS collaborated to develop Cropwise AI , an innovative solution powered by Amazon Bedrock Agents , to accelerate their sales reps’ ability to place Syngenta seed products with growers across North America. Generative AI is reshaping businesses and unlocking new opportunities across various industries.

AWS

AWS Machine Learning AI AI

DeepSeek AI introduces NSA: A faster approach to long-context modeling

Dataconomy

FEBRUARY 19, 2025

Traditional attention mechanismsthe core of how AI processes and remembers informationstruggle to scale efficiently, making models costly to train and run. Now, researchers from DeepSeek-AI and Peking University have introduced a game-changing approach called Natively Sparse Attention (NSA).

AI

AI AI Clustering

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

AWS Machine Learning Blog

MARCH 3, 2025

Increasingly, organizations across industries are turning to generative AI foundation models (FMs) to enhance their applications. The launcher interfaces with underlying cluster management systems such as SageMaker HyperPod (Slurm or Kubernetes) or training jobs, which handle resource allocation and scheduling. recipes=recipe-name.

Clustering

Clustering AWS ML ML

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

This year, generative AI and machine learning (ML) will again be in focus, with exciting keynote announcements and a variety of sessions showcasing insights from AWS experts, customer stories, and hands-on experiences with AWS services. Fifth, we’ll showcase various generative AI use cases across industries.

AWS

AWS ML ML AI

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. This post is cowritten with Isaac Cameron and Alex Gnibus from Tecton.

ML

ML ML AWS AI

Accelerating UMAP: Processing 10 Million Records in Under a Minute With No Code Changes

ODSC - Open Data Science

JUNE 6, 2025

On June 12, 2025 at NVIDIA GTC Paris, learn more about cuML and clustering algorithms during the hands-on workshop, Accelerate Clustering Algorithms to Achieve the Highest Performance. It dramatically improves algorithm performance for data-intensive tasks involving tens to hundreds of millions of records.

Clustering

Clustering Machine Learning Machine Learning Algorithm

Techniques for automatic summarization of documents using language models

Flipboard

DECEMBER 6, 2023

The model then uses a clustering algorithm to group the sentences into clusters. The sentences that are closest to the center of each cluster are selected to form the summary. Implementation includes the following steps: The first step is to break down the large document, such as a book, into smaller sections, or chunks.

AWS

AWS Clustering Artificial Intelligence Artificial Intelligence

Optimizing costs of generative AI applications on AWS

AWS Machine Learning Blog

DECEMBER 26, 2024

The report The economic potential of generative AI: The next productivity frontier , published by McKinsey & Company, estimates that generative AI could add an equivalent of $2.6 The potential for such large business value is galvanizing tens of thousands of enterprises to build their generative AI applications in AWS.

AWS

AWS Database AI AI

Implement smart document search index with Amazon Textract and Amazon OpenSearch

AWS Machine Learning Blog

SEPTEMBER 8, 2023

For modern companies that deal with enormous volumes of documents such as contracts, invoices, resumes, and reports, efficiently processing and retrieving pertinent data is critical to maintaining a competitive edge. What if there was a way to process documents intelligently and make them searchable in with high accuracy?

AWS

AWS Clustering ML ML

How Hexagon built an AI assistant using AWS generative AI services

AWS Machine Learning Blog

MAY 13, 2025

Recognizing the transformative benefits of generative AI for enterprises, we at Hexagons Asset Lifecycle Intelligence division sought to enhance how users interact with our Enterprise Asset Management (EAM) products. This phase focused on establishing a secure and compliant foundation to enable the responsible adoption of generative AI.

AWS

AWS AI AI Machine Learning

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

AWS Machine Learning Blog

AUGUST 9, 2024

Question and answering (Q&A) using documents is a commonly used application in various use cases like customer support chatbots, legal research assistants, and healthcare advisors. In this collaboration, the AWS GenAIIC team created a RAG-based solution for Deltek to enable Q&A on single and multiple government solicitation documents.

AWS

AWS Database AI AI

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

The landscape of enterprise application development is undergoing a seismic shift with the advent of generative AI. This intuitive platform enables the rapid development of AI-powered solutions such as conversational interfaces, document summarization tools, and content generation apps through a drag-and-drop interface.

AI

AI AI Database AWS

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

AWS Machine Learning Blog

APRIL 2, 2025

At its core, Ray offers a unified programming model that allows developers to seamlessly scale their applications from a single machine to a distributed cluster. Combining the resiliency of SageMaker HyperPod and the efficiency of Ray provides a powerful framework to scale up your generative AI workloads.

Clustering

Clustering AWS AI AI

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

From deriving insights to powering generative artificial intelligence (AI) -driven applications, the ability to efficiently process and analyze large datasets is a vital capability. This same interface is also used for provisioning EMR clusters. The following diagram illustrates this solution.

AWS

AWS Clustering Big Data Big Data

Integrate HyperPod clusters with Active Directory for seamless multi-user login

AWS Machine Learning Blog

APRIL 22, 2024

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB.

Clustering

Clustering AWS Machine Learning ML

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

AWS Machine Learning Blog

JULY 25, 2024

Solution overview The solution is based on the node problem detector and recovery DaemonSet, a powerful tool designed to automatically detect and report various node-level problems in a Kubernetes cluster. Choose Clusters in the navigation pane, open the trainium-inferentia cluster, choose Node groups, and locate your node group. #

Clustering

Clustering AWS ML ML

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

Flipboard

FEBRUARY 10, 2025

AI agents are rapidly becoming the next frontier in enterprise transformation, with 82% of organizations planning adoption within the next 3 years. According to a Capgemini survey of 1,100 executives at large enterprises, 10% of organizations already use AI agents, and more than half plan to use them in the next year.

AI

AI AI AWS ML

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies and AWS. For example, imagine a consulting firm that manages documentation for multiple healthcare providerseach customers sensitive patient records and operational documents must remain strictly separated.

Database

Database AWS Natural Language Processing AI

Build AWS architecture diagrams using Amazon Q CLI and MCP

AWS Machine Learning Blog

JUNE 30, 2025

These diagrams serve as essential communication tools for stakeholders, documentation of compliance requirements, and blueprints for implementation teams. By using generative AI through natural language prompts, architects can now generate professional diagrams in minutes rather than hours, while adhering to AWS best practices.

AWS

AWS Database Python Clustering

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

AWS Machine Learning Blog

MARCH 10, 2025

The traditional approach of manually sifting through countless research documents, industry reports, and financial statements is not only time-consuming but can also lead to missed opportunities and incomplete analysis. It became apparent that a cost-effective solution for our generative AI needs was required.

AWS

AWS Database AI AI

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning Blog

NOVEMBER 20, 2024

Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information. Solution overview Amazon Q Business is a fully managed, generative AI-powered assistant that helps enterprises unlock the value of their data and knowledge.

Database

Database AWS SQL ETL

Accelerating Mixtral MoE fine-tuning on Amazon SageMaker with QLoRA

AWS Machine Learning Blog

NOVEMBER 22, 2024

Companies across various scales and industries are using large language models (LLMs) to develop generative AI applications that provide innovative experiences for customers and employees. By offloading the management and maintenance of the training cluster to SageMaker, we reduce both training time and our total cost of ownership (TCO).

Clustering

Clustering AWS ML ML

How AI is fueling our hope in the fight against cancer

Dataconomy

JUNE 10, 2025

In a recent, deeply personal and inspiring address, Ruth Porat, President and Chief Investment Officer of Alphabet and Google, shared a powerful vision of how AI is set to revolutionize cancer research, detection, and treatment. Here, too, AI is making a dramatic impact. So, how is this partnership working in practice?

AI

AI AI Clustering Artificial Intelligence

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

Customers want to search through all of the data and applications across their organization, and they want to see the provenance information for all of the documents retrieved. For more details about RDF data format, refer to the W3C documentation. The following is an example of RDF triples in N-triples file format: "sales_qty_sold".

AWS

AWS Database ML ML

DeepSeek’s new open-source colossus upends the AI status quo

Dataconomy

MARCH 26, 2025

Just two days ago, Chinese AI startup DeepSeek quietly dropped a bombshell on Hugging Face: a 685-billion-parameter large language model called DeepSeek-V3-0324. Just a massive set of model weights, an MIT license, and a few technical whispers that were enough to set the AI community ablaze. No splashy press briefings.

AI

AI AI Clustering Artificial Intelligence

Create Audience Segments Using K-Means Clustering, Churn Prevention with Reinforcement Learning…

ODSC - Open Data Science

FEBRUARY 23, 2023

5 Must-Know Pillars of a Data Science and AI Foundation A data science and AI foundation needs to be built up properly before diving in head-first. By knowing these core skills, like math and AI literacy, you’ll start off your career on a high note. Tesla’s Automated Driving Documents Have Been Requested by The U.S.

Clustering

Clustering Data Science Machine Learning Machine Learning

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Organizations can search for PII using methods such as keyword searches, pattern matching, data loss prevention tools, machine learning (ML), metadata analysis, data classification software, optical character recognition (OCR), document fingerprinting, and encryption. This speeds up the PII detection process and also reduces the overall cost.

AWS

AWS Machine Learning Machine Learning ML

Streamline AWS resource troubleshooting with Amazon Bedrock Agents and AWS Support Automation Workflows

AWS Machine Learning Blog

MARCH 20, 2025

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

AWS

AWS Clustering AI AI

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

AWS Machine Learning Blog

APRIL 18, 2025

Retrieval Augmented Generation (RAG) enhances AI responses by combining the generative AI models capabilities with information from external data sources, rather than relying solely on the models built-in knowledge. Based on the quality and quantity of the data, the time to complete this process varied.

Apache Kafka

Apache Kafka AWS Clustering Database

A Guide to Choose the Right Vector Embedding Model for Generative AI Use Cases

Data Science Dojo

MARCH 13, 2024

You need the right tools to fully unleash the power of generative AI. A vector embedding model is one such tool that is a critical component of AI applications for creating realistic text, images, and more. The use case and outcomes of your generative AI application guide your choice of model. What are vector embedding models?

AI

AI AI Database Clustering

How climate tech startups are building foundation models with Amazon SageMaker HyperPod

Flipboard

JUNE 4, 2025

Climate tech startups are at the forefront of building impactful solutions to the climate crisis, and theyre using generative AI to build as quickly as possible. Trends among climate tech startups building with generative AI Climate tech startups adoption of generative AI is evolving rapidly.

AWS

AWS Clustering ML ML

Automate chatbot for document and data retrieval using Agents and Knowledge Bases for Amazon Bedrock

AWS Machine Learning Blog

MAY 1, 2024

This post presents a solution for developing a chatbot capable of answering queries from both documentation and databases, with straightforward deployment. For documentation retrieval, Retrieval Augmented Generation (RAG) stands out as a key tool. Virginia) AWS Region. The following diagram illustrates the solution architecture.

AWS

AWS Machine Learning Machine Learning SQL

Build a just-in-time knowledge base with Amazon Bedrock

AWS Machine Learning Blog

JULY 7, 2025

Software as a service (SaaS) companies managing multiple tenants face a critical challenge: efficiently extracting meaningful insights from vast document collections while controlling costs.

AWS

AWS Clustering Database AI

Managing your cloud ecosystems: Upgrading your cluster to a new version

IBM Journey to AI blog

SEPTEMBER 5, 2023

In the second blog of the series, we’re discussing best practices for upgrading your clusters to newer versions. You are responsible for applying these updates to the cluster master and worker nodes. Patch updates are automatically applied to cluster masters, but you are responsible for updating your cluster’s worker nodes.

Clustering

Easy Late-Chunking With Chonkie

Towards AI

FEBRUARY 5, 2025

Last Updated on February 5, 2025 by Editorial Team Author(s): Michael Ryaboy Originally published on Towards AI. This article breaks down what Late Chunking is, why its essential for embedding larger or more intricate documents, and how to build it into your search pipeline using Chonkie and KDB.AI as the vector store.

Database

Database Clustering AI AI

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Towards AI

JANUARY 29, 2025

Author(s): Dwaipayan Bandyopadhyay Originally published on Towards AI. Source : Image by Author In todays AI World, where large amounts of structured and unstructured data are generated daily, accurately using knowledge has become the cornerstone of modern-day technology. What is MongoDB Atlas?

Database

Database Clustering Python SQL

Snowpark ML: How to do Document Classification on Snowflake

phData

JANUARY 30, 2024

Snowpark ML is transforming the way that organizations implement AI solutions. Vector embeddings are a popular technique for working with unstructured data for Generative AI use cases. Document Vectors With the success of word embeddings , it’s understood that entire documents can be represented in a similar way.

ML

ML ML Python Machine Learning

Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

For instance, when developing a medical search engine, obtaining a large dataset of real user queries and relevant documents is often infeasible due to privacy concerns surrounding personal health information. These PDFs will serve as the source for generating document chunks.

AWS

AWS Artificial Intelligence Artificial Intelligence Machine Learning

Evaluating Long-Context Question & Answer Systems

Introducing Databricks One

Trending Sources

#47 Building a NotebookLM Clone, Time Series Clustering, Instruction Tuning, and More!

How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Hierarchical Clustering in Machine Learning: An In-Depth Guide

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

DeepSeek AI introduces NSA: A faster approach to long-context modeling

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

Your guide to generative AI and ML at AWS re:Invent 2024

Real value, real time: Production AI with Amazon SageMaker and Tecton

Accelerating UMAP: Processing 10 Million Records in Under a Minute With No Code Changes

Techniques for automatic summarization of documents using language models

Optimizing costs of generative AI applications on AWS

Implement smart document search index with Amazon Textract and Amazon OpenSearch

How Hexagon built an AI assistant using AWS generative AI services

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

Build AWS architecture diagrams using Amazon Q CLI and MCP

Transforming financial analysis with CreditAI on Amazon Bedrock: Octus’s journey with AWS

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

Accelerating Mixtral MoE fine-tuning on Amazon SageMaker with QLoRA

How AI is fueling our hope in the fight against cancer

Search enterprise data assets using LLMs backed by knowledge graphs

DeepSeek’s new open-source colossus upends the AI status quo

Create Audience Segments Using K-Means Clustering, Churn Prevention with Reinforcement Learning…

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Streamline AWS resource troubleshooting with Amazon Bedrock Agents and AWS Support Automation Workflows

Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

A Guide to Choose the Right Vector Embedding Model for Generative AI Use Cases

How climate tech startups are building foundation models with Amazon SageMaker HyperPod

Automate chatbot for document and data retrieval using Agents and Knowledge Bases for Amazon Bedrock

Build a just-in-time knowledge base with Amazon Bedrock

Managing your cloud ecosystems: Upgrading your cluster to a new version

Easy Late-Chunking With Chonkie

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Snowpark ML: How to do Document Classification on Snowflake

Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock

Stay Connected