Blog - Data Science Current

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Analytics

Analytics Analytics Data Science AI

8 Ways to Scale your Data Science Workloads

KDnuggets

JULY 22, 2025

Every data scientist has been there: downsampling a dataset because it won’t fit into memory or hacking together a way to let a business user interact with a machine learning model. Machine Learning in your Spreadsheets BQML training and prediction from a Google Sheet Many data conversations start and end in a spreadsheet.

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

Generative AI: A Self-Study Roadmap

KDnuggets

JULY 11, 2025

For developers and data practitioners, this shift presents both opportunity and challenge. Traditional machine learning systems excel at classification, prediction, and optimization—they analyze existing data to make decisions about new inputs. This difference shapes everything about how you work with these systems.

AI

AI AI Machine Learning Machine Learning

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Forget Streamlit: Create an Interactive Data Science Dashboard in Excel in Minutes

KDnuggets

JUNE 19, 2025

By Shamima Sultana on June 19, 2025 in Data Science Image by Editor | Midjourney While Python-based tools like Streamlit are popular for creating data dashboards, Excel remains one of the most accessible and powerful platforms for building interactive data visualizations. Data labels on top of columns.

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

KDnuggets

JUNE 27, 2025

By Vinod Chugani on June 27, 2025 in Data Science Image by Author | ChatGPT Introduction Creating interactive web-based data dashboards in Python is easier than ever when you combine the strengths of Streamlit , Pandas , and Plotly. unique()) # Filter data filtered_df = df[(df[Region].isin(regions)) unique(), default=df[Region].unique())

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

The Power of RLVR: Training a Leading SQL Reasoning Model on Databricks

databricks

JULY 30, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

SQL

SQL Data Science Artificial Intelligence Artificial Intelligence

The 7 Most Useful Jupyter Notebook Extensions for Data Scientists

KDnuggets

JUNE 18, 2025

By Cornellius Yudha Wijaya , KDnuggets Technical Content Specialist on June 18, 2025 in Data Science Image by Author As a data scientist, Jupyter Notebook has become one of the first platforms we learn to use, as it allows for easier data manipulation compared to standard programming IDEs.

Data Scientist

Data Scientist Natural Language Processing Data Science Machine Learning

Amazon Q Apps supports customization and governance of generative AI-powered apps

AWS Machine Learning Blog

DECEMBER 12, 2024

We are excited to announce new features that allow creation of more powerful apps, while giving more governance control using Amazon Q Apps, a capability within Amazon Q Business that allows you to create generative AI-powered apps based on your organizations data. The next feature we discuss is custom labels.

AI

AI AI AWS

Generate financial industry-specific insights using generative AI and in-context fine-tuning

AWS Machine Learning Blog

NOVEMBER 12, 2024

In this blog post, we demonstrate prompt engineering techniques to generate accurate and relevant analysis of tabular data using industry-specific language. This is done by providing large language models (LLMs) in-context sample data with features and labels in the prompt.

SQL

SQL AWS AI AI

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

AWS Machine Learning Blog

NOVEMBER 13, 2024

In the context of generative AI , significant progress has been made in developing multimodal embedding models that can embed various data modalities—such as text, image, video, and audio data—into a shared vector space. Alternatively, you could directly upload the dataset to an S3 bucket by using the AWS Management Console.

AWS

AWS Database K-nearest Neighbors AI

Build Interactive Machine Learning Apps with Gradio

Flipboard

JULY 8, 2025

Publish AI, ML & data-science insights to a global community of data professionals. In this blog, we’ll take a fun, hands-on approach to learning the key Gradio components by building a text-to-speech (TTS) web application that you can run on an AI PC or Intel® Tiber™ AI Cloud and share with others.

Machine Learning

Machine Learning Machine Learning Data Science Python

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 21, 2024

This setup enables the model to learn from human-labeled data, refining its ability to produce content that aligns with natural human expectations. We guide you through deploying the necessary infrastructure using AWS CloudFormation , creating an internal labeling workforce, and setting up your first labeling job.

AWS

AWS AI AI Natural Language Processing

Summary of DAIS 2025 Announcements Through the Lens of Games

databricks

JULY 15, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

Flipboard

JULY 16, 2025

By Jayita Gulati on July 16, 2025 in Machine Learning Image by Editor In data science and machine learning, raw data is rarely suitable for direct consumption by algorithms. Transforming this data into meaningful, structured inputs that models can learn from is an essential step — this process is known as feature engineering.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

Benefits of Using LiteLLM for Your LLM Apps

KDnuggets

JULY 23, 2025

Its also possible to provide custom label tags to help attribute costs to certain usage or departments. For data privacy, you are responsible for your own privacy as a user deploying LiteLLM yourself, but this approach is more secure since the data never leaves your controlled environment except when sent to the LLM providers.

Natural Language Processing

Natural Language Processing Data Science Python Machine Learning

Muvera: Making multi-vector retrieval as fast as single-vector search

Hacker News

JUNE 26, 2025

How tall is Mt Everest?”), the goal of IR is to find information relevant to the query from a very large collection of data (e.g., MUVERA: A solution with fixed dimensional encodings MUVERA offers an elegant solution by reducing multi-vector similarity search to single-vector MIPS to make retrieval over complex multi-vector data much faster.

Algorithm

Algorithm Natural Language Processing Data Mining Data Mining

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

AWS Machine Learning Blog

OCTOBER 31, 2024

By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. To support various labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks like image classification, object detection, and semantic segmentation.

AWS

AWS Natural Language Processing ML ML

What Is Agentic AI? A Gateway to Building Smarter and Autonomous Agents

Data Science Dojo

APRIL 25, 2025

In this blog, we will break down what agentic AI is, how it works, where its being used, and what it means for the future. It takes in data, makes sense of it, and uses that information to plan its next move. For example, a single AI agent can monitor thousands of network endpoints or manage customer service chats around the world.

AI

AI AI Supervised Learning Algorithm

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools

AWS Machine Learning Blog

JUNE 24, 2025

These pairs act as demonstration data for Supervised Fine-Tuning (SFT), teaching models how to respond to similar inputs accurately. In this blog post, we’ll walk you through how to set up these templates in SageMaker to create high-quality datasets for training your large language models. Choose Create labeling job.

AI

AI AI AWS Machine Learning

The IKEA of Data: How to Bring Modular Thinking to Your Data Architecture (and Why It Works)

IBM Data Science in Practice

MAY 19, 2025

Those dreaded (rather liked) 3-letter acronymsIOT A few years ago, I found myself thinking about how messy IoT data could getfast. I ended up comparing it to a supermarket: different aisles, different types of data, all needing their own shelf space and labelingsystem. Todays data ecosystems are even more complex.

Data Lakes

Data Lakes SQL Data Science Data Engineering

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

AWS Machine Learning Blog

DECEMBER 9, 2024

In the initial stages of an ML project, data scientists collaborate closely, sharing experimental results to address business challenges. MLflow , a popular open-source tool, helps data scientists organize, track, and analyze ML and generative AI experiments, making it easier to reproduce and compare results.

AWS

AWS ML ML Data Scientist

Setting Up a Machine Learning Pipeline on Google Cloud Platform

Flipboard

JULY 25, 2025

By Cornellius Yudha Wijaya , KDnuggets Technical Content Specialist on July 25, 2025 in Data Engineering Image by Editor | ChatGPT # Introduction Machine learning has become an integral part of many companies, and businesses that dont utilize it risk being left behind. Download the data and store it somewhere for now.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

Scaling de-duplication in WorldCat: Balancing AI innovation with cataloging care | OCLC

Flipboard

JUNE 23, 2025

But with bibliographic data pouring in faster than ever, we need to address the challenge of keeping records accurate, connected, and accessible at speed. At OCLC, we’ve invested resources into a hybrid approach, leveraging AI to process vast amounts of data while ensuring catalogers and OCLC experts remain at the center of decision-making.

AI

AI AI Machine Learning Machine Learning

Discover insights from Gmail using the Gmail connector for Amazon Q Business

AWS Machine Learning Blog

OCTOBER 31, 2024

Amazon Q Business is a fully managed, generative AI-powered assistant designed to enhance enterprise operations. It can be tailored to specific business needs by connecting to company data, information, and systems through over 40 built-in connectors.

AWS

AWS AI AI ML

Multilingual content processing using Amazon Bedrock and Amazon A2I

AWS Machine Learning Blog

NOVEMBER 13, 2024

These large language models (LLMs) are trained on a vast amount of data from various domains and languages. Amazon Augmented AI (Amazon A2I) simplifies the creation of workflows for human review, managing the heavy lifting associated with developing these systems or overseeing a large reviewer workforce.

AWS

AWS Machine Learning ML Machine Learning

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

AWS Machine Learning Blog

NOVEMBER 14, 2024

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM) , making it easier to securely share and discover machine learning (ML) models across your AWS accounts. The model is finally deployed to production.

AWS

AWS ML ML Machine Learning

Mistral launches customizable content moderation API

Dataconomy

NOVEMBER 8, 2024

This API, which already powers Mistral’s Le Chat chatbot, is designed to classify and manage undesirable text across a variety of safety standards and specific applications. Mistral AI has announced the release of its new content moderation API.

AI

AI AI Artificial Intelligence Artificial Intelligence

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

AWS Machine Learning Blog

NOVEMBER 14, 2024

Implementing a cost allocation strategy early is critical for managing your expenses and future optimization activities that will reduce your spend. Implement a tagging strategy A tag is a label you assign to an AWS resource. Tags consist of a customer-defined key and an optional value to help manage, search for, and filter resources.

ML

ML ML AWS Machine Learning

How Indeed builds and deploys fine-tuned LLMs on Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 11, 2024

Since our founding nearly two decades ago, machine learning (ML) and artificial intelligence (AI) have been at the heart of building data-driven products that better match job seekers with the right roles and get people hired. To address these challenges, we used Amazon SageMaker to initiate and manage training jobs efficiently.

AWS

AWS ML ML Artificial Intelligence

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. A preprocessor script is a capability of SageMaker Model Monitor to preprocess SageMaker endpoint data capture before creating metrics for model quality.

ML

ML ML AWS Data Scientist

Considerations for addressing the core dimensions of responsible AI for Amazon Bedrock applications

AWS Machine Learning Blog

NOVEMBER 15, 2024

Concerns about legal implications, accuracy of AI-generated outputs, data privacy, and broader societal impacts have underscored the importance of responsible AI development. This can be useful when you have requirements for sensitive data handling and user privacy.

AWS

AWS AI AI ML

Build Observable Data Flywheels for Production with Iguazio’s MLRun and NVIDIA NeMo Microservices

Iguazio

JUNE 11, 2025

We are proud to announce a new integration between MLRun, the open-source AI orchestration framework, and NVIDIA NeMo microservices, by extending NVIDIA Data Flywheel Blueprint. Read the blog for more details, or go straight to the blueprint to try it out for yourself. What is an AI Data Flywheel? What is MLRun?

ML

ML ML AI AI

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service

AWS Machine Learning Blog

NOVEMBER 14, 2024

It enables you to privately customize the FM of your choice with your data using techniques such as fine-tuning, prompt engineering, and retrieval augmented generation (RAG) and build agents that run tasks using your enterprise systems and data sources while adhering to security and privacy requirements.

AWS

AWS AI AI Machine Learning

Build and deploy a UI for your generative AI applications with AWS and Python

AWS Machine Learning Blog

NOVEMBER 6, 2024

However, as exciting as these advancements are, data scientists often face challenges when it comes to developing UIs and to prototyping and interacting with their business users. Streamlit allows data scientists to create interactive web applications using Python, using their existing skills and knowledge. Choose Manage model access.

AWS

AWS Python AI AI

A guide to Amazon Bedrock Model Distillation (preview)

AWS Machine Learning Blog

DECEMBER 4, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities to build generative AI applications, simplifying development with security, privacy, and responsible AI. No data synthesis techniques are applied.

AWS

AWS AI AI ML

How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai

The MLOps Blog

DECEMBER 26, 2024

The experiment tracker can handle large amounts of data, making it well-suited for quick iteration and extensive evaluations of LLM-based applications. While LLMs are powerful, they rely solely on their pre-trained knowledge and lack the ability to fetch current data. Our usage in this blog should be well within the free-tier limits.

Database

Database Python Clustering Machine Learning

Integrate foundation models into your code with Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 6, 2024

These powerful models, trained on vast amounts of data, can generate human-like text, answer questions, and even engage in creative writing tasks. Enter Amazon Bedrock , a fully managed service that provides developers with seamless access to cutting-edge FMs through simple APIs. He is passionate about cloud and machine learning.

AWS

AWS Python Machine Learning Machine Learning

Object Detection and Visual Grounding with Qwen 2.5

PyImageSearch

JUNE 9, 2025

VL Models Prompt Structure Task-Specific Instruction Object or Feature Specification Contextual Clues or Relationships Output Requirements Model Response Format Bounding Box Coordinates (bbox_2d or point_2d) Primary Label (label), Sub-Labels, and Descriptions Hands-on with Qwen 2.5 model series excels (i.e.,

Deep Learning

Deep Learning Deep Learning Artificial Intelligence Artificial Intelligence

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

AWS Machine Learning Blog

JANUARY 31, 2025

This is a guest blog post co-written with Jordan Knight, Sara Reynolds, George Lee from Travelers. Increasingly, FMs are completing tasks that were previously solved by supervised learning, which is a subset of machine learning (ML) that involves training algorithms using a labeled dataset. The PDF is split into individual pages.

Supervised Learning

Supervised Learning Data Scientist AWS ML

Android Earthquake Alerts: A global system for early warning

Hacker News

JULY 22, 2025

The system then quickly analyzes data from many phones to confirm that an earthquake is happening and estimate its location and magnitude. To receive alerts, users must have Wi-Fi and/or cellular data connectivity, and both Android Earthquake Alerts and location settings enabled.

Data Mining

Data Mining Data Mining Data Mining Natural Language Processing

Fraud detection empowered by federated learning with the Flower framework on Amazon SageMaker AI

AWS Machine Learning Blog

JULY 11, 2025

Traditional ML models often rely on centralized data aggregation, which raises concerns about data security and regulatory constraints. Traditional fraud models often rely on isolated data, leading to overfitting and poor real-world performance. Data privacy laws like GDPR and CCPA further limit collaboration.

AWS

AWS ML ML AI

Life beyond the leaderboard

DrivenData Labs

MAY 12, 2025

They want to benchmark the level of performance that can be achieved with their data. An ensemble of the top solutions was able to push the state-of-the-art on unseen data, reducing error by 30% compared with the National Centers for Environmental Information (NCEI) benchmark model. But what happens next?

Algorithm

Algorithm Machine Learning Machine Learning Deep Learning

Create a data labeling project with Amazon SageMaker Ground Truth Plus

AWS Machine Learning Blog

OCTOBER 15, 2024

Amazon SageMaker Ground Truth is a powerful data labeling service offered by AWS that provides a comprehensive and scalable platform for labeling various types of data, including text, images, videos, and 3D point clouds, using a diverse workforce of human annotators. Each batch is made up of data objects to be labeled.

AWS

AWS ML ML Machine Learning

Classifiers in Machine Learning

Pickl AI

APRIL 13, 2025

Summary: Classifier in Machine Learning involves categorizing data into predefined classes using algorithms like Logistic Regression and Decision Trees. Introduction Machine Learning has revolutionized how we process and analyse data, enabling systems to learn patterns and make predictions.

Machine Learning

Machine Learning Machine Learning Decision Trees K-nearest Neighbors

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

8 Ways to Scale your Data Science Workloads

Webinars

Trending Sources

Generative AI: A Self-Study Roadmap

Webinars

Forget Streamlit: Create an Interactive Data Science Dashboard in Excel in Minutes

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

The Power of RLVR: Training a Leading SQL Reasoning Model on Databricks

The 7 Most Useful Jupyter Notebook Extensions for Data Scientists

Amazon Q Apps supports customization and governance of generative AI-powered apps

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Build a reverse image search engine with Amazon Titan Multimodal Embeddings in Amazon Bedrock and AWS managed services

Build Interactive Machine Learning Apps with Gradio

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

Summary of DAIS 2025 Announcements Through the Lens of Games

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

Benefits of Using LiteLLM for Your LLM Apps

Muvera: Making multi-vector retrieval as fast as single-vector search

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

What Is Agentic AI? A Gateway to Building Smarter and Autonomous Agents

Power Your LLM Training and Evaluation with the New SageMaker AI Generative AI Tools

The IKEA of Data: How to Bring Modular Thinking to Your Data Architecture (and Why It Works)

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

Setting Up a Machine Learning Pipeline on Google Cloud Platform

Scaling de-duplication in WorldCat: Balancing AI innovation with cataloging care | OCLC

Discover insights from Gmail using the Gmail connector for Amazon Q Business

Multilingual content processing using Amazon Bedrock and Amazon A2I

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Mistral launches customizable content moderation API

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

How Indeed builds and deploys fine-tuned LLMs on Amazon SageMaker

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Considerations for addressing the core dimensions of responsible AI for Amazon Bedrock applications

Build Observable Data Flywheels for Production with Iguazio’s MLRun and NVIDIA NeMo Microservices

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service

Build and deploy a UI for your generative AI applications with AWS and Python

A guide to Amazon Bedrock Model Distillation (preview)

How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai

Integrate foundation models into your code with Amazon Bedrock

Object Detection and Visual Grounding with Qwen 2.5

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

Android Earthquake Alerts: A global system for early warning

Fraud detection empowered by federated learning with the Flower framework on Amazon SageMaker AI

Life beyond the leaderboard

Create a data labeling project with Amazon SageMaker Ground Truth Plus

Classifiers in Machine Learning

Stay Connected