Blog - Data Science Current

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Analytics

Analytics Analytics Data Science AI

Forget Streamlit: Create an Interactive Data Science Dashboard in Excel in Minutes

KDnuggets

JUNE 19, 2025

By Shamima Sultana on June 19, 2025 in Data Science Image by Editor | Midjourney While Python-based tools like Streamlit are popular for creating data dashboards, Excel remains one of the most accessible and powerful platforms for building interactive data visualizations. Data labels on top of columns.

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

KDnuggets

JUNE 27, 2025

By Vinod Chugani on June 27, 2025 in Data Science Image by Author | ChatGPT Introduction Creating interactive web-based data dashboards in Python is easier than ever when you combine the strengths of Streamlit , Pandas , and Plotly. unique()) # Filter data filtered_df = df[(df[Region].isin(regions)) unique(), default=df[Region].unique())

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

The 7 Most Useful Jupyter Notebook Extensions for Data Scientists

KDnuggets

JUNE 18, 2025

By Cornellius Yudha Wijaya , KDnuggets Technical Content Specialist on June 18, 2025 in Data Science Image by Author As a data scientist, Jupyter Notebook has become one of the first platforms we learn to use, as it allows for easier data manipulation compared to standard programming IDEs.

Data Scientist

Data Scientist Natural Language Processing Data Science Machine Learning

What Is Agentic AI? A Gateway to Building Smarter and Autonomous Agents

Data Science Dojo

APRIL 25, 2025

In this blog, we will break down what agentic AI is, how it works, where its being used, and what it means for the future. It takes in data, makes sense of it, and uses that information to plan its next move. For example, a single AI agent can monitor thousands of network endpoints or manage customer service chats around the world.

AI

AI AI Supervised Learning Algorithm

Amazon Q Apps supports customization and governance of generative AI-powered apps

AWS Machine Learning Blog

DECEMBER 12, 2024

We are excited to announce new features that allow creation of more powerful apps, while giving more governance control using Amazon Q Apps, a capability within Amazon Q Business that allows you to create generative AI-powered apps based on your organizations data. The next feature we discuss is custom labels.

AI

AI AI AWS

Scaling de-duplication in WorldCat: Balancing AI innovation with cataloging care | OCLC

Flipboard

JUNE 23, 2025

But with bibliographic data pouring in faster than ever, we need to address the challenge of keeping records accurate, connected, and accessible at speed. At OCLC, we’ve invested resources into a hybrid approach, leveraging AI to process vast amounts of data while ensuring catalogers and OCLC experts remain at the center of decision-making.

AI

AI AI Machine Learning Machine Learning

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

In this blog post, we demonstrate prompt engineering techniques to generate accurate and relevant analysis of tabular data using industry-specific language. This is done by providing large language models (LLMs) in-context sample data with features and labels in the prompt.

SQL

SQL AWS AI AI

Muvera: Making multi-vector retrieval as fast as single-vector search

Hacker News

JUNE 26, 2025

How tall is Mt Everest?”), the goal of IR is to find information relevant to the query from a very large collection of data (e.g., MUVERA: A solution with fixed dimensional encodings MUVERA offers an elegant solution by reducing multi-vector similarity search to single-vector MIPS to make retrieval over complex multi-vector data much faster.

Algorithm

Algorithm Natural Language Processing Data Mining Data Mining

The IKEA of Data: How to Bring Modular Thinking to Your Data Architecture (and Why It Works)

IBM Data Science in Practice

MAY 19, 2025

Those dreaded (rather liked) 3-letter acronymsIOT A few years ago, I found myself thinking about how messy IoT data could getfast. I ended up comparing it to a supermarket: different aisles, different types of data, all needing their own shelf space and labelingsystem. Todays data ecosystems are even more complex.

Data Lakes

Data Lakes SQL Data Science Data Engineer

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools

AWS Machine Learning Blog

JUNE 24, 2025

These pairs act as demonstration data for Supervised Fine-Tuning (SFT), teaching models how to respond to similar inputs accurately. In this blog post, we’ll walk you through how to set up these templates in SageMaker to create high-quality datasets for training your large language models. Choose Create labeling job.

AI

AI AI AWS Machine Learning

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

AWS Machine Learning Blog

OCTOBER 31, 2024

By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. To support various labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks like image classification, object detection, and semantic segmentation.

AWS

AWS Natural Language Processing ML ML

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 21, 2024

This setup enables the model to learn from human-labeled data, refining its ability to produce content that aligns with natural human expectations. We guide you through deploying the necessary infrastructure using AWS CloudFormation , creating an internal labeling workforce, and setting up your first labeling job.

AWS

AWS AI AI Natural Language Processing

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

AWS Machine Learning Blog

NOVEMBER 13, 2024

In the context of generative AI , significant progress has been made in developing multimodal embedding models that can embed various data modalities—such as text, image, video, and audio data—into a shared vector space. Alternatively, you could directly upload the dataset to an S3 bucket by using the AWS Management Console.

AWS

AWS Database K-nearest Neighbors AI

Create a data labeling project with Amazon SageMaker Ground Truth Plus

AWS Machine Learning Blog

OCTOBER 15, 2024

Amazon SageMaker Ground Truth is a powerful data labeling service offered by AWS that provides a comprehensive and scalable platform for labeling various types of data, including text, images, videos, and 3D point clouds, using a diverse workforce of human annotators. Each batch is made up of data objects to be labeled.

AWS

AWS ML ML Machine Learning

Generative vs Discriminative AI: Understanding the 5 Key Differences

Data Science Dojo

MAY 27, 2024

In this blog, we will explore the details of both approaches and navigate through their differences. These algorithms use existing data like text, images, and audio to generate content that looks like it comes from the real world. This approach involves techniques where the machine learns from massive amounts of data.

K-nearest Neighbors

K-nearest Neighbors Supervised Learning AI AI

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot

Hacker News

JUNE 11, 2025

As a zero-click AI vulnerability, EchoLeak opens up extensive opportunities for data exfiltration and extortion attacks for motivated threat actors. Visit Aim Labs Partners Book a demo Book a demo Thank you! âWe will be in touch soon Oops! âWe will be in touch soon Oops! Something went wrong while submitting the form.

AI

AI AI

Discover insights from Gmail using the Gmail connector for Amazon Q Business

AWS Machine Learning Blog

OCTOBER 31, 2024

Amazon Q Business is a fully managed, generative AI-powered assistant designed to enhance enterprise operations. It can be tailored to specific business needs by connecting to company data, information, and systems through over 40 built-in connectors.

AWS

AWS AI AI ML

Build Observable Data Flywheels for Production with Iguazio’s MLRun and NVIDIA NeMo Microservices

Iguazio

JUNE 11, 2025

We are proud to announce a new integration between MLRun, the open-source AI orchestration framework, and NVIDIA NeMo microservices, by extending NVIDIA Data Flywheel Blueprint. Read the blog for more details, or go straight to the blueprint to try it out for yourself. What is an AI Data Flywheel? What is MLRun?

ML

ML ML AI AI

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies and AWS. Organizations need to control access to their data across different business units, including companies, departments, or even individuals, while maintaining scalability.

Database

Database AWS Natural Language Processing AI

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

AWS Machine Learning Blog

DECEMBER 9, 2024

In the initial stages of an ML project, data scientists collaborate closely, sharing experimental results to address business challenges. MLflow , a popular open-source tool, helps data scientists organize, track, and analyze ML and generative AI experiments, making it easier to reproduce and compare results.

AWS

AWS ML ML Data Scientist

Mistral launches customizable content moderation API

Dataconomy

NOVEMBER 8, 2024

This API, which already powers Mistral’s Le Chat chatbot, is designed to classify and manage undesirable text across a variety of safety standards and specific applications. Mistral AI has announced the release of its new content moderation API.

AI

AI AI Artificial Intelligence Artificial Intelligence

Object Detection and Visual Grounding with Qwen 2.5

PyImageSearch

JUNE 9, 2025

VL Models Prompt Structure Task-Specific Instruction Object or Feature Specification Contextual Clues or Relationships Output Requirements Model Response Format Bounding Box Coordinates (bbox_2d or point_2d) Primary Label (label), Sub-Labels, and Descriptions Hands-on with Qwen 2.5 model series excels (i.e.,

Deep Learning

Deep Learning Deep Learning Artificial Intelligence Artificial Intelligence

What is Categorical Data Encoding? 7 Effective Methods

Data Science Dojo

JULY 23, 2024

Data is a crucial element of modern-day businesses. With the growing use of machine learning (ML) models to handle, store, and manage data, the efficiency and impact of enterprises have also increased. It has led to advanced techniques for data management, where each tactic is based on the type of data and the way to handle it.

Machine Learning

Machine Learning Machine Learning Algorithm ML

Multilingual content processing using Amazon Bedrock and Amazon A2I

AWS Machine Learning Blog

NOVEMBER 13, 2024

These large language models (LLMs) are trained on a vast amount of data from various domains and languages. Amazon Augmented AI (Amazon A2I) simplifies the creation of workflows for human review, managing the heavy lifting associated with developing these systems or overseeing a large reviewer workforce.

AWS

AWS Machine Learning ML Machine Learning

Object Detection in Gaming: Fine-Tuning Google’s PaliGemma 2 for Valorant

PyImageSearch

APRIL 28, 2025

Jump Right To The Downloads Section How would you like immediate access to 3,457 images curated and labeled with hand gestures to train, explore, and experiment with … for free? Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments? Looking for the source code to this post?

Deep Learning

Deep Learning Deep Learning Computer Science Computer Science

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

AWS Machine Learning Blog

NOVEMBER 14, 2024

Implementing a cost allocation strategy early is critical for managing your expenses and future optimization activities that will reduce your spend. Implement a tagging strategy A tag is a label you assign to an AWS resource. Tags consist of a customer-defined key and an optional value to help manage, search for, and filter resources.

ML

ML ML AWS Machine Learning

Life beyond the leaderboard

DrivenData Labs

MAY 12, 2025

They want to benchmark the level of performance that can be achieved with their data. An ensemble of the top solutions was able to push the state-of-the-art on unseen data, reducing error by 30% compared with the National Centers for Environmental Information (NCEI) benchmark model. But what happens next?

Algorithm

Algorithm Machine Learning Machine Learning Deep Learning

How Indeed builds and deploys fine-tuned LLMs on Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 11, 2024

Since our founding nearly two decades ago, machine learning (ML) and artificial intelligence (AI) have been at the heart of building data-driven products that better match job seekers with the right roles and get people hired. To address these challenges, we used Amazon SageMaker to initiate and manage training jobs efficiently.

AWS

AWS ML ML Artificial Intelligence

From prompt chaos to clarity: How to build a robust AI orchestration layer

Flipboard

JUNE 18, 2025

Managing all that sprawl, especially when attempting to build interoperability in the long run, can become overwhelming. Teneo said in a blog post that once that’s clear, teams must know what they need from their orchestration system and ensure these are the first features they look for. AI agents seem like an inevitability these days.

AI

AI AI Data Pipeline ML

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service

AWS Machine Learning Blog

NOVEMBER 14, 2024

It enables you to privately customize the FM of your choice with your data using techniques such as fine-tuning, prompt engineering, and retrieval augmented generation (RAG) and build agents that run tasks using your enterprise systems and data sources while adhering to security and privacy requirements.

AWS

AWS AI AI Machine Learning

Considerations for addressing the core dimensions of responsible AI for Amazon Bedrock applications

AWS Machine Learning Blog

NOVEMBER 15, 2024

Concerns about legal implications, accuracy of AI-generated outputs, data privacy, and broader societal impacts have underscored the importance of responsible AI development. This can be useful when you have requirements for sensitive data handling and user privacy.

AWS

AWS AI AI ML

The power of machine learning in your business: A step-by-step guide

Data Science Dojo

DECEMBER 28, 2023

In this blog post, we’ll break down the end-to-end ML process in business, guiding you through each stage with examples and insights that make it easy to grasp. Optimize supply chains like Walmart’s inventory management. Cleaning the data to remove errors and inconsistencies. Gathering more data.

Machine Learning

Machine Learning Machine Learning ML ML

Using LLMs to fortify cyber defenses: Sophos’s insight on strategies for using LLMs with Amazon Bedrock and Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 26, 2024

We have defined three specialized tasks that are covered later in the blog. It employs a combination of technology, processes, and skilled personnel to maintain the confidentiality, integrity, and availability of information systems and data. We use Anthropic’s Claude 3 Sonnet on Amazon Bedrock to illustrate the use cases.

Machine Learning

Machine Learning Machine Learning ML ML

The evolution of LLM embeddings: An overview of NLP

Data Science Dojo

MAY 10, 2024

Embeddings provide a way to present complex data in a way that is understandable by machines. In this blog, we will focus on these embeddings in LLM and explore how they have evolved over time within the world of NLP, each transformation being a result of technological advancement and progress.

Supervised Learning

Supervised Learning Clustering ML ML

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. A preprocessor script is a capability of SageMaker Model Monitor to preprocess SageMaker endpoint data capture before creating metrics for model quality.

ML

ML ML AWS Data Scientist

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts. The model is finally deployed to production.

AWS

AWS ML ML Machine Learning

A guide to Amazon Bedrock Model Distillation (preview)

AWS Machine Learning Blog

DECEMBER 4, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities to build generative AI applications, simplifying development with security, privacy, and responsible AI. No data synthesis techniques are applied.

AWS

AWS AI AI ML

Build and deploy a UI for your generative AI applications with AWS and Python

AWS Machine Learning Blog

NOVEMBER 6, 2024

However, as exciting as these advancements are, data scientists often face challenges when it comes to developing UIs and to prototyping and interacting with their business users. Streamlit allows data scientists to create interactive web applications using Python, using their existing skills and knowledge. Choose Manage model access.

AWS

AWS Python AI AI

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

AWS Machine Learning Blog

JANUARY 31, 2025

The power of FMs lies in their ability to learn robust and generalizable data embeddings that can be effectively transferred and fine-tuned for a wide variety of downstream tasks, ranging from automated disease detection and tissue characterization to quantitative biomarker analysis and pathological subtyping.

AWS

AWS Supervised Learning ML ML

Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

As enterprises increasingly embrace generative AI , they face challenges in managing the associated costs. This limitation has added complexity to cost management for generative AI initiatives. anthropic.claude-3-sonnet-20240229-v1:0", "inferenceProfileId": "us-1.anthropic.claude-3-sonnet-20240229-v1:0",

AWS

AWS AI AI Deep Learning

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

AWS Machine Learning Blog

JANUARY 31, 2025

This is a guest blog post co-written with Jordan Knight, Sara Reynolds, George Lee from Travelers. Increasingly, FMs are completing tasks that were previously solved by supervised learning, which is a subset of machine learning (ML) that involves training algorithms using a labeled dataset. The PDF is split into individual pages.

Supervised Learning

Supervised Learning Data Scientist AWS ML

WordFinder app: Harnessing generative AI on AWS for aphasia communication

AWS Machine Learning Blog

MAY 2, 2025

Secure access using Route 53 and Amplify The journey begins with the user accessing the WordFinder app through a domain managed by Amazon Route 53 , a highly available and scalable cloud DNS web service. Amazon Rekognition analyzes the image, identifying objects present and returning labels with confidence scores.

AWS

AWS AI AI Machine Learning

Build custom generative AI applications powered by Amazon Bedrock

AWS Machine Learning Blog

AUGUST 6, 2024

With last month’s blog, I started a series of posts that highlight the key factors that are driving customers to choose Amazon Bedrock. Trained on massive datasets, these models can rapidly comprehend data and generate relevant responses across diverse domains, from summarizing content to answering questions.

AI

AI AI AWS Artificial Intelligence

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

Forget Streamlit: Create an Interactive Data Science Dashboard in Excel in Minutes

Trending Sources

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

The 7 Most Useful Jupyter Notebook Extensions for Data Scientists

What Is Agentic AI? A Gateway to Building Smarter and Autonomous Agents

Amazon Q Apps supports customization and governance of generative AI-powered apps

Scaling de-duplication in WorldCat: Balancing AI innovation with cataloging care | OCLC

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Muvera: Making multi-vector retrieval as fast as single-vector search

The IKEA of Data: How to Bring Modular Thinking to Your Data Architecture (and Why It Works)

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

Create a data labeling project with Amazon SageMaker Ground Truth Plus

Generative vs Discriminative AI: Understanding the 5 Key Differences

EchoLeak – 0-Click AI Vulnerability Enabling Data Exfiltration from 365 Copilot

Discover insights from Gmail using the Gmail connector for Amazon Q Business

Build Observable Data Flywheels for Production with Iguazio’s MLRun and NVIDIA NeMo Microservices

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

Mistral launches customizable content moderation API

Object Detection and Visual Grounding with Qwen 2.5

What is Categorical Data Encoding? 7 Effective Methods

Multilingual content processing using Amazon Bedrock and Amazon A2I

Object Detection in Gaming: Fine-Tuning Google’s PaliGemma 2 for Valorant

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

Life beyond the leaderboard

How Indeed builds and deploys fine-tuned LLMs on Amazon SageMaker

From prompt chaos to clarity: How to build a robust AI orchestration layer

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service

Considerations for addressing the core dimensions of responsible AI for Amazon Bedrock applications

The power of machine learning in your business: A step-by-step guide

Using LLMs to fortify cyber defenses: Sophos’s insight on strategies for using LLMs with Amazon Bedrock and Amazon SageMaker

The evolution of LLM embeddings: An overview of NLP

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

A guide to Amazon Bedrock Model Distillation (preview)

Build and deploy a UI for your generative AI applications with AWS and Python

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

WordFinder app: Harnessing generative AI on AWS for aphasia communication

Build custom generative AI applications powered by Amazon Bedrock

Stay Connected