Document, ML and Python - Data Science Current

MLFlow Mastery: A Complete Guide to Experiment Tracking and Model Management

KDnuggets

JUNE 23, 2025

Managing ML projects without MLFlow is challenging. MLFlow Projects MLflow Projects enable reproducibility and portability by standardizing the structure of ML code. A project contains: Source code : The Python scripts or notebooks for training and evaluation. It supports scalability and works with popular ML libraries.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 1: ModelTrainer

AWS Machine Learning Blog

DECEMBER 12, 2024

Amazon SageMaker has redesigned its Python SDK to provide a unified object-oriented interface that makes it straightforward to interact with SageMaker services. We show you how to use the ModelTrainer class to train your ML models, which includes executing distributed training using a custom script or container.

ML

ML ML Python AWS

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

This year, generative AI and machine learning (ML) will again be in focus, with exciting keynote announcements and a variety of sessions showcasing insights from AWS experts, customer stories, and hands-on experiences with AWS services. Visit the session catalog to learn about all our generative AI and ML sessions.

AWS

AWS ML ML AI

Serve Machine Learning Models via REST APIs in Under 10 Minutes

KDnuggets

JULY 4, 2025

Run it once to generate the model file: python model/train_model.py More On This Topic FastAPI Tutorial: Build APIs with Python in Minutes Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python Top 5 Machine Learning APIs Practitioners Should Know 5 Machine Learning Models Explained in 5 Minutes 3 APIs to Access Gemini 2.5

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

AWS Machine Learning Blog

DECEMBER 12, 2024

In Part 1 of this series, we introduced the newly launched ModelTrainer class on the Amazon SageMaker Python SDK and its benefits, and showed you how to fine-tune a Meta Llama 3.1 The machine learning (ML) practitioners need to iterate over these settings before finally deploying the endpoint to SageMaker for inference.

ML

ML ML Python AWS

Evaluating Long-Context Question & Answer Systems

Eugene Yan

JUNE 21, 2025

eugeneyan Start Here Writing Speaking Prototyping About Evaluating Long-Context Question & Answer Systems [ llm eval survey ] · 28 min read While evaluating Q&A systems is straightforward with short paragraphs, complexity increases as documents grow larger. Helpfulness: How relevant, comprehensive, and useful the response is for the user.

Clustering

Clustering Natural Language Processing AI AI

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 15, 2024

We’re excited to announce the release of SageMaker Core , a new Python SDK from Amazon SageMaker designed to offer an object-oriented approach for managing the machine learning (ML) lifecycle. With SageMaker Core, managing ML workloads on SageMaker becomes simpler and more efficient. or greater is installed in the environment.

Python

Python AWS ML ML

Multilingual content processing using Amazon Bedrock and Amazon A2I

AWS Machine Learning Blog

NOVEMBER 13, 2024

The market size for multilingual content extraction and the gathering of relevant insights from unstructured documents (such as images, forms, and receipts) for information processing is rapidly increasing. These languages might not be supported out of the box by existing document extraction software.

AWS

AWS Machine Learning ML Machine Learning

Snowpark ML: How to do Document Classification on Snowflake

phData

JANUARY 30, 2024

Snowpark ML is transforming the way that organizations implement AI solutions. Snowpark allows ML models and code to run on Snowflake warehouses. By “bringing the code to the data,” we’ve seen ML applications run anywhere from 4-100x faster than other architectures. df = session.table("BBC_ARTICLES").filter(col("CLASS")

ML

ML ML Python Machine Learning

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

Flipboard

DECEMBER 3, 2024

As a global leader in agriculture, Syngenta has led the charge in using data science and machine learning (ML) to elevate customer experiences with an unwavering commitment to innovation. Efficient metadata storage with Amazon DynamoDB – To support quick and efficient data retrieval, document metadata is stored in Amazon DynamoDB.

AWS

AWS AI AI Machine Learning

Amazon Bedrock Prompt Management is now available in GA

AWS Machine Learning Blog

NOVEMBER 7, 2024

For this example, we enter the following: You are an expert financial analyst with years of experience in summarizing complex financial documents. For this post, we use the following prompt: Summarize the following financial document for {{company_name}} with ticker symbol {{ticker_symbol}}: Please provide a brief summary that includes 1.

AWS

AWS ML ML AI

John Snow Labs Medical LLMs are now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 25, 2024

You can try out the models with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. To learn more, refer to the API documentation. Both models support a context window of 32,000 tokens, which is roughly 50 pages of text.

AWS

AWS ML ML Machine Learning

Enhance Your LLM Agents with BM25: Lightweight Retrieval That Works

Towards AI

APRIL 28, 2025

Prerequisites Before diving in, you should have: Basic AI/ML understanding: concepts like language models, embeddings, and model inference. Software engineering skills: familiarity with Python, virtual environments, and package installation. Python libraries: comfort importing and using packages and file I/O.

Python

Python Database Data Science AI

Lessons Learned After 6.5 Years Of Machine Learning

Flipboard

JUNE 30, 2025

Publish AI, ML & data-science insights to a global community of data professionals. In looking back, I often find new principles that have been accompanying me during learning ML. Luckily, in our domain, doing ML research and engineering, quick wit is not the superpower that gets you far. You want to train ML models.

Machine Learning

Machine Learning Machine Learning Data Science ML

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

AWS Machine Learning Blog

NOVEMBER 13, 2024

You can now register machine learning (ML) models in Amazon SageMaker Model Registry with Amazon SageMaker Model Cards , making it straightforward to manage governance information for specific model versions directly in SageMaker Model Registry in just a few clicks.

ML

ML ML AWS Data Preparation

Create a document lake using large-scale text extraction from documents with Amazon Textract

AWS Machine Learning Blog

JANUARY 8, 2024

AWS customers in healthcare, financial services, the public sector, and other industries store billions of documents as images or PDFs in Amazon Simple Storage Service (Amazon S3). In this post, we focus on processing a large collection of documents into raw text files and storing them in Amazon S3.

AWS

AWS Python ML ML

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Flipboard

JULY 2, 2025

For businesses, RAG offers a powerful way to use internal knowledge by connecting company documentation to a generative AI model. When an employee asks a question, the RAG system retrieves relevant information from the company’s internal documents and uses this context to generate an accurate, company-specific response.

AWS

AWS Clustering K-nearest Neighbors Algorithm

Intelligent document processing with Amazon Textract, Amazon Bedrock, and LangChain

AWS Machine Learning Blog

OCTOBER 24, 2023

In today’s information age, the vast volumes of data housed in countless documents present both a challenge and an opportunity for businesses. Traditional document processing methods often fall short in efficiency and accuracy, leaving room for innovation, cost-efficiency, and optimizations. However, the potential doesn’t end there.

Database

Database AWS ML ML

Detect signatures on documents or images using the signatures feature in Amazon Textract

Flipboard

FEBRUARY 9, 2023

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Signatures is a feature within Amazon Textract that offers the ability to automatically detect signatures on any document.

ML

ML ML AWS Machine Learning

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2024

This significant improvement showcases how the fine-tuning process can equip these powerful multimodal AI systems with specialized skills for excelling at understanding and answering natural language questions about complex, document-based visual information. For a detailed walkthrough on fine-tuning the Meta Llama 3.2 Vision models.

ML

ML ML Python AWS

Host ML models on Amazon SageMaker using Triton: Python backend

AWS Machine Learning Blog

MAY 9, 2023

Amazon SageMaker provides a number of options for users who are looking for a solution to host their machine learning (ML) models. For that use case, SageMaker provides SageMaker single model endpoints (SMEs), which allow you to deploy a single ML model against a logical endpoint.

Python

Python ML ML Deep Learning

Cost-effective document classification using the Amazon Titan Multimodal Embeddings Model

AWS Machine Learning Blog

APRIL 11, 2024

Organizations across industries want to categorize and extract insights from high volumes of documents of different formats. Manually processing these documents to classify and extract information remains expensive, error prone, and difficult to scale. Categorizing documents is an important first step in IDP systems.

AWS

AWS Database Algorithm ML

Customize Amazon Textract with business-specific documents using Custom Queries

AWS Machine Learning Blog

NOVEMBER 6, 2023

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. Queries is a feature that enables you to extract specific pieces of information from varying, complex documents using natural language.

ML

ML ML AWS Machine Learning

Implement smart document search index with Amazon Textract and Amazon OpenSearch

AWS Machine Learning Blog

SEPTEMBER 8, 2023

For modern companies that deal with enormous volumes of documents such as contracts, invoices, resumes, and reports, efficiently processing and retrieving pertinent data is critical to maintaining a competitive edge. What if there was a way to process documents intelligently and make them searchable in with high accuracy?

AWS

AWS Clustering ML ML

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 1: PySDK Improvements

Flipboard

NOVEMBER 30, 2023

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and effortlessly build, train, and deploy machine learning (ML) models at any scale. Deploy traditional models to SageMaker endpoints In the following examples, we showcase how to use ModelBuilder to deploy traditional ML models.

ML

ML ML AWS Python

How To Learn Python For Data Science?

Pickl AI

NOVEMBER 4, 2024

Summary: Python for Data Science is crucial for efficiently analysing large datasets. With numerous resources available, mastering Python opens up exciting career opportunities. Introduction Python for Data Science has emerged as a pivotal tool in the data-driven world. As the global Python market is projected to reach USD 100.6

Data Science

Data Science Python Machine Learning Machine Learning

Faster distributed graph neural network training with GraphStorm v0.4

AWS Machine Learning Blog

FEBRUARY 11, 2025

GraphStorm is a low-code enterprise graph machine learning (ML) framework that provides ML practitioners a simple way of building, training, and deploying graph ML solutions on industry-scale graph data. billion edges after adding reverse edges. seconds Evaluation Summary Total evaluations: 11 Average evaluation time: 1.90

AWS

AWS Python ML ML

Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart

AWS Machine Learning Blog

OCTOBER 4, 2023

Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. In the following sections, we show how to get started with document summarization by deploying Falcon 7B on SageMaker Jumpstart.

AWS

AWS ML ML Data Scientist

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 30, 2024

Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges.

AWS

AWS AI AI ML

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

Today, we’re introducing the new capability to chat with your document with zero setup in Knowledge Bases for Amazon Bedrock. With this new capability, you can securely ask questions on single documents, without the overhead of setting up a vector database or ingesting data, making it effortless for businesses to use their enterprise data.

AWS

AWS Database Python AI

Open-source packages for using speech data in ML

DrivenData Labs

APRIL 8, 2025

These applications are all enabled by a strong ecosystem of open-source Python packages for working with image data. In this post, we provide an overview of open-source Python packages for extracting features from speech audio data. Now that we all know a little bit about speech waveforms, back to Python!

ML

ML ML Machine Learning Machine Learning

15 Fantastic Features of Python

Pickl AI

MARCH 21, 2025

Summary: Features of Python Programming Language is a versatile, beginner-friendly language known for its simple syntax, vast libraries, and cross-platform compatibility. With continuous updates and strong community support, Python remains a top choice for developers. Learn Python with Pickl.AI Learn Python with Pickl.AI

Python

Python Data Science Machine Learning Machine Learning

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

This allows SageMaker Studio users to perform petabyte-scale interactive data preparation, exploration, and machine learning (ML) directly within their familiar Studio notebooks, without the need to manage the underlying compute infrastructure. In this post, we build a Docker image that includes the Python 3.11

AWS

AWS Clustering Big Data Big Data

Improve Amazon Nova migration performance with data-aware prompt optimization

AWS Machine Learning Blog

APRIL 29, 2025

The following example shows how prompt optimization converts a typical prompt for a summarization task on Anthropics Claude Haiku into a well-structured prompt for an Amazon Nova model, with sections that begin with special markdown tags such as ## Task, ### Summarization Instructions , and ### Document to Summarize.

AWS

AWS ML ML AI

Build AWS architecture diagrams using Amazon Q CLI and MCP

AWS Machine Learning Blog

JUNE 30, 2025

These diagrams serve as essential communication tools for stakeholders, documentation of compliance requirements, and blueprints for implementation teams. Set up your environment Before you can start creating diagrams, you need to set up your environment with Amazon Q CLI, the AWS Diagram MCP server, and AWS Documentation MCP server.

AWS

AWS Database Python Clustering

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

Such data often lacks the specialized knowledge contained in internal documents available in modern businesses, which is typically needed to get accurate answers in domains such as pharmaceutical research, financial investigation, and customer support. For example, imagine that you are planning next year’s strategy of an investment company.

SQL

SQL AWS Analytics Analytics

Deploy Gradio Apps on Hugging Face Spaces

PyImageSearch

DECEMBER 30, 2024

Hugging Face Spaces is a platform for deploying and sharing machine learning (ML) applications with the community. It offers an interactive interface, enabling users to explore ML models directly in their browser without the need for local setup. app.py: This file will contain the main app logic. We recommend PyImageSearch University.

Deep Learning

Deep Learning Deep Learning ML ML

Create a generative AI assistant with Slack and Amazon Bedrock

Flipboard

NOVEMBER 27, 2024

This content builds on posts such as Deploy a Slack gateway for Amazon Bedrock by adding integrations to Amazon Bedrock Knowledge Bases and Amazon Bedrock Guardrails, and the Bolt for Python library to simplify Slack message acknowledgement and authentication requirements. Chunks are vectorized and stored in a vector database.

AWS

AWS AI AI Database

Gemma 3 27B model now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 28, 2025

The second approach is using SageMaker JumpStart, a machine learning (ML) hub, with foundation models (FMs), built-in algorithms, and pre-built ML solutions. This resource includes integration examples, API documentation, and programming samples. You can deploy pre-trained models using either the Amazon SageMaker console or SDK.

AWS

AWS ML ML AI

Build a gen AI–powered financial assistant with Amazon Bedrock multi-agent collaboration

Flipboard

MAY 2, 2025

It efficiently manages the distribution of automated reports and handles stakeholder communications, providing properly formatted emails containing portfolio information and document summaries that reach their intended recipients. Note that additional documents can be incorporated to enhance your data assistant agents capabilities.

AI

AI AI AWS Machine Learning

Run small language models cost-efficiently with AWS Graviton and Amazon SageMaker AI

Flipboard

JUNE 5, 2025

Amazon SageMaker AI provides a fully managed service for deploying these machine learning (ML) models with multiple inference options, allowing organizations to optimize for cost, latency, and throughput. In this application, we install or update a few libraries for running Llama.cpp in Python.

AWS

AWS AI AI ML

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

Flipboard

FEBRUARY 10, 2025

In this post, we dive into how organizations can use Amazon SageMaker AI , a fully managed service that allows you to build, train, and deploy ML models at scale, and can build AI agents using CrewAI, a popular agentic framework and open source models like DeepSeek-R1. Having access to a JupyterLab IDE with Python 3.9, 3.10, or 3.11

AI

AI AI AWS ML

Voxel51 Open-Sources VoxelGPT: An AI Assistant That Harnesses GPT-3.5’s Power to Generate Python Code for Computer Vision Dataset Analysis

Flipboard

JUNE 22, 2023

VoxelGPT offers seamless integration of natural language queries with practical Python code. Search documentation, API specifications, and tutorials : VoxelGPT provides access to the complete collection of FiftyOne documentation, assisting users in quickly finding answers to FiftyOne-related questions.

Python

Python Machine Learning Machine Learning AI

NeMo Retriever Llama 3.2 text embedding and reranking NVIDIA NIM microservices now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MARCH 18, 2025

embedding NIM is optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 8,192 tokens) and dynamic embedding size (Matryoshka Embeddings). reranking NIM is optimized for providing a logit score that represents how relevant a document is to a given query.

AWS

AWS AI AI Computer Science

MLFlow Mastery: A Complete Guide to Experiment Tracking and Model Management

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 1: ModelTrainer

Trending Sources

Your guide to generative AI and ML at AWS re:Invent 2024

Serve Machine Learning Models via REST APIs in Under 10 Minutes

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

Evaluating Long-Context Question & Answer Systems

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

Multilingual content processing using Amazon Bedrock and Amazon A2I

Snowpark ML: How to do Document Classification on Snowflake

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

Amazon Bedrock Prompt Management is now available in GA

John Snow Labs Medical LLMs are now available in Amazon SageMaker JumpStart

Enhance Your LLM Agents with BM25: Lightweight Retrieval That Works

Lessons Learned After 6.5 Years Of Machine Learning

Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

Create a document lake using large-scale text extraction from documents with Amazon Textract

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Intelligent document processing with Amazon Textract, Amazon Bedrock, and LangChain

Detect signatures on documents or images using the signatures feature in Amazon Textract

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

Host ML models on Amazon SageMaker using Triton: Python backend

Cost-effective document classification using the Amazon Titan Multimodal Embeddings Model

Customize Amazon Textract with business-specific documents using Custom Queries

Implement smart document search index with Amazon Textract and Amazon OpenSearch

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 1: PySDK Improvements

How To Learn Python For Data Science?

Faster distributed graph neural network training with GraphStorm v0.4

Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Open-source packages for using speech data in ML

15 Fantastic Features of Python

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Improve Amazon Nova migration performance with data-aware prompt optimization

Build AWS architecture diagrams using Amazon Q CLI and MCP

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

Deploy Gradio Apps on Hugging Face Spaces

Create a generative AI assistant with Slack and Amazon Bedrock

Gemma 3 27B model now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Build a gen AI–powered financial assistant with Amazon Bedrock multi-agent collaboration

Run small language models cost-efficiently with AWS Graviton and Amazon SageMaker AI

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

Voxel51 Open-Sources VoxelGPT: An AI Assistant That Harnesses GPT-3.5’s Power to Generate Python Code for Computer Vision Dataset Analysis

NeMo Retriever Llama 3.2 text embedding and reranking NVIDIA NIM microservices now available in Amazon SageMaker JumpStart

Stay Connected