Books and Data Preparation - Data Science Current

30 Best Data Science Books to Read in 2023

Analytics Vidhya

FEBRUARY 28, 2023

To achieve maximum efficiency, every company strives to use various data at every stage of its operations.

Data Science

Data Science Data Preparation Big Data Big Data

Data Preparation for Analysis : Towards Creating your Tableau Dashboard?—?Part 1

Analytics Vidhya

MAY 17, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon. Introduction Visual analytics can tell the users the story of data. The post Data Preparation for Analysis : Towards Creating your Tableau Dashboard?—?Part Part 1 appeared first on Analytics Vidhya.

Data Preparation

Data Preparation Tableau Data Science Analytics

Why There’s No Better Time to Learn LLM Development

Towards AI

NOVEMBER 5, 2024

To make learning LLM development more accessible, we’ve released an e-book second edition version of Building LLMs for Production on Towards AI Academy at a lower price than on Amazon. The core concepts discussed in the book are becoming a foundation for practitioners and companies working with LLMs. What’s New?

Data Preparation

Data Preparation Machine Learning Machine Learning AI

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

5 Top Large Language Models & Generative AI Books

Towards AI

AUGUST 6, 2024

Master LLMs & Generative AI Through These Five Books This article reviews five key books that explore the rapidly evolving fields of large language models (LLMs) and generative AI, providing essential insights into these transformative technologies. Author(s): Youssef Hosni Originally published on Towards AI.

Natural Language Processing

Natural Language Processing AI AI AWS

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data Visualization

4 Ways to Handle Insufficient Data In Machine Learning!

Analytics Vidhya

JUNE 13, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon AGENDA: Introduction Machine Learning pipeline Problems with data Why do we. The post 4 Ways to Handle Insufficient Data In Machine Learning! appeared first on Analytics Vidhya.

Machine Learning

Machine Learning Machine Learning Data Science Analytics

Data science revolution 101 – Unleashing the power of data in the digital age

Data Science Dojo

JUNE 7, 2023

The primary aim is to make sense of the vast amounts of data generated daily by combining statistical analysis, programming, and data visualization. It is divided into three primary areas: data preparation, data modeling, and data visualization.

Data Science

Data Science Data Visualization Data Scientist Machine Learning

Implementing Approximate Nearest Neighbor Search with KD-Trees

PyImageSearch

DECEMBER 23, 2024

We will start by setting up libraries and data preparation. Setup and Data Preparation For implementing a similar word search, we will use the gensim library for loading pre-trained word embeddings vector. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

K-nearest Neighbors

K-nearest Neighbors Algorithm Deep Learning Deep Learning

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 15, 2024

Importing data from the SageMaker Data Wrangler flow allows you to interact with a sample of the data before scaling the data preparation flow to the full dataset. This improves time and performance because you don’t need to work with the entirety of the data during preparation.

ML

ML ML Data Preparation AWS

Introducing our New Book: Implementing MLOps in the Enterprise

Iguazio

DECEMBER 14, 2023

With practical code examples and specific tool recommendations, the book empowers readers to implement the concepts effectively. After reading the book, ML practitioners and leaders will know how to deploy their ML models to production and scale their AI initiatives, while overcoming the challenges many other businesses are facing.

ML

ML ML Data Science Data Preparation

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 1, 2024

We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

Flipboard

NOVEMBER 20, 2024

Knowledge base – You need a knowledge base created in Amazon Bedrock with ingested data and metadata. For detailed instructions on setting up a knowledge base, including data preparation, metadata creation, and step-by-step guidance, refer to Amazon Bedrock Knowledge Bases now supports metadata filtering to improve retrieval accuracy.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

10 Best Data Engineering Books [Beginners to Advanced]

Pickl AI

AUGUST 1, 2023

Aspiring and experienced Data Engineers alike can benefit from a curated list of books covering essential concepts and practical techniques. These 10 Best Data Engineering Books for beginners encompass a range of topics, from foundational principles to advanced data processing methods. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Unlock proprietary data with Snorkel Flow and Amazon SageMaker

Snorkel AI

DECEMBER 2, 2024

At its core, Snorkel Flow empowers data scientists and domain experts to encode their knowledge into labeling functions, which are then used to generate high-quality training datasets. This approach not only enhances the efficiency of data preparation but also improves the accuracy and relevance of AI models.

AWS

AWS Machine Learning Machine Learning Data Preparation

Best practices for Meta Llama 3.2 multimodal fine-tuning on Amazon Bedrock

AWS Machine Learning Blog

MAY 1, 2025

Best practices for data preparation The quality and structure of your training data fundamentally determine the success of fine-tuning. Our experiments revealed several critical insights for preparing effective multimodal datasets: Data structure You should use a single image per example rather than multiple images.

AWS

AWS ML ML AI

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Flipboard

MARCH 22, 2023

Snowflake is an AWS Partner with multiple AWS accreditations, including AWS competencies in machine learning (ML), retail, and data and analytics. You can import data from multiple data sources, such as Amazon Simple Storage Service (Amazon S3), Amazon Athena , Amazon Redshift , Amazon EMR , and Snowflake.

AWS

AWS Data Preparation Azure Data Scientist

Experience the new and improved Amazon SageMaker Studio

AWS Machine Learning Blog

DECEMBER 1, 2023

Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. She is also the author of a book on computer vision. In his spare time, he loves traveling and writing.

ML

ML ML Machine Learning Machine Learning

Multimodality in LLMs: Understanding its Power and Impact

Data Science Dojo

JULY 31, 2024

In the context of Artificial Intelligence (AI), a modality refers to a specific type or form of data that can be processed and understood by AI models. Images : This involves visual data, including photographs, drawings, and any kind of visual representation in digital form. How it Works?

AI

AI AI Supervised Learning Analytics

On the implementation of digital tools

Dataconomy

OCTOBER 15, 2024

The challenges related to PDF data Several projects highlighted challenges in capturing PDF data. While accounting teams typically book summarized versions, users needed line item details for analytics. Future trends Emerging trends are reshaping the data analytics landscape.

Data Modeling

Data Modeling Data Models Analytics Analytics

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 22, 2023

We’re excited to announce Amazon SageMaker Data Wrangler support for Amazon S3 Access Points. In this post, we walk you through importing data from, and exporting data to, an S3 access point in SageMaker Data Wrangler. He wrote a book on AWS FinOps, and enjoys reading and building solutions.

AWS

AWS Data Science Data Preparation Artificial Intelligence

Introduction to Power BI Datamarts

ODSC - Open Data Science

JUNE 12, 2023

The Datamarts capability opens endless possibilities for organizations to achieve their data analytics goals on the Power BI platform. This article is an excerpt from the book Expert Data Modeling with Power BI, Third Edition by Soheil Bakhshi, a completely updated and revised edition of the bestselling guide to Power BI and data modeling.

Power BI

Power BI Data Warehouse ETL Data Preparation

#54 Things are never boring with RAG! Vector Store, Vector Search, Knowledge Base, and more!

Towards AI

DECEMBER 19, 2024

Data preparation using Roboflow, model loading and configuration PaliGemma2 (including optional LoRA/QLoRA), and data loader creation are explained. Finally, it offers best practices for fine-tuning, emphasizing data quality, parameter optimization, and leveraging transfer learning techniques.

Database

Database AI AI Data Preparation

Supervised vs Unsupervised Learning: Key Differences

How to Learn Machine Learning

MARCH 25, 2025

It groups similar data points or identifies outliers without prior guidance. Type of Data Used in Each Approach Supervised learning depends on data that has been organized and labeled. This data preparation process ensures that every example in the dataset has an input and a known output.

Supervised Learning

Supervised Learning Machine Learning Machine Learning Algorithm

Building a RAG chatbot with LangChain, Chroma, Hugging Face, and Arcee Conductor

Julien Simon

MARCH 31, 2025

Data Preparation The first step in building the RAG chatbot is to prepare the data. In this case, the data consists of PDF documents, which can be research articles or any other PDF files of your choice. Its recommended to use a virtual environment to manage dependencies and avoid conflicts with other projects.

Machine Learning

Machine Learning Machine Learning Python Data Preparation

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

AWS Machine Learning Blog

DECEMBER 18, 2024

This strategic decision was driven by several factors: Efficient data preparation Building a high-quality pre-training dataset is a complex task, involving assembling and preprocessing text data from various sources, including web sources and partner companies. The team opted for fine-tuning on AWS.

Clustering

Clustering AWS AI AI

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 12, 2023

In the following sections, we break down the data preparation, model experimentation, and model deployment steps in more detail. Data preparation Scalable Capital uses a CRM tool for managing and storing email data. Relevant email contents consist of subject, body, and the custodian banks.

Data Science

Data Science Data Scientist AWS ML

Approximate Nearest Neighbor with Locality Sensitive Hashing (LSH)

PyImageSearch

JANUARY 27, 2025

We will start by setting up libraries and data preparation. Setup and Data Preparation For implementing a similar word search, we will use the gensim library for loading pre-trained word embeddings vectors. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

K-nearest Neighbors

K-nearest Neighbors Algorithm Data Preparation Database

Exploring data using AI chat at Domo with Amazon Bedrock

AWS Machine Learning Blog

SEPTEMBER 9, 2024

The next step is to provide them with a more intuitive and conversational interface to interact with their data, empowering them to generate meaningful visualizations and reports through natural language interactions. Outside of work, he enjoys playing lawn tennis and reading books. powered by Amazon Bedrock Domo.AI

AI

AI AI AWS ML

Snorkel Flow 2023.R3 release: PaLM integration, streamlined onboarding, and enhanced user experience

Snorkel AI

NOVEMBER 1, 2023

When Vertex Model Monitoring detects data drift, input feature values are submitted to Snorkel Flow, enabling ML teams to adapt labeling functions quickly, retrain the model, and then deploy the new model with Vertex AI. See what Snorkel can do to accelerate your data science and machine learning teams. Book a demo today.

ML

ML ML Machine Learning Machine Learning

From First Principles: Building Function Calling by Fine-tuning NanoGPT

Towards AI

APRIL 12, 2025

You say, Book me a flight to San Francisco, and instead of just writing a response, the AI actually starts the booking process. This isnt science fiction its function calling, and its changing how we interact with AI. Most people see these intelligent systems as black boxes, magically responding to commands.

Data Preparation

Data Preparation Machine Learning Machine Learning AI

Predictive Maintenance Using Isolation Forest

PyImageSearch

OCTOBER 21, 2024

We will start by setting up libraries and data preparation. Setup and Data Preparation For this purpose, we will use the Pump Sensor Dataset , which contains readings of 52 sensors that capture various parameters (e.g., detection of potential failures or issues). temperature, pressure, vibration, etc.) Download the code!

Algorithm

Algorithm Deep Learning Deep Learning Data Preparation

Serverless Machine Learning in AWS: Lambda + Step Functions Guide

How to Learn Machine Learning

APRIL 16, 2025

For example, services like S3, API Gateway, and Kinesis can trigger processes as soon as new data is detected. AWS Lambda functions perform data preparation tasks such as cleaning and transforming data before moving on to the inference stage.

Machine Learning

Machine Learning Machine Learning AWS ML

Multimodality in LLMs: Understanding its Power and Impact

Data Science Dojo

JULY 31, 2024

In the context of Artificial Intelligence (AI), a modality refers to a specific type or form of data that can be processed and understood by AI models. Primary modalities commonly involved in AI include: Text : This includes any form of written language, such as articles, books, social media posts, and other textual data.

AI

AI AI Supervised Learning Data Preparation

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

You marked your calendars, you booked your hotel, and you even purchased the airfare. In this code talk, learn how to prepare data at scale using built-in data preparation assistance, co-edit the same notebook in real time, and automate conversion of notebook code to production-ready jobs. We’ll see you there!

AWS

AWS ML ML AI

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

Flipboard

DECEMBER 13, 2024

This assistant framework is built upon three pillars: Knowledge awareness Using RAG, CWIC compiles and delivers comprehensive knowledge that is crucial for customers from intricate calculations of book value to period-end reconciliation processes. Pre-trained model teardown Remove the pre-trained model to free up resources.

Analytics

Analytics Analytics AI AI

Everything New Coming to ODSC East 2025

ODSC - Open Data Science

DECEMBER 16, 2024

Youll gain immediate, practical skills in Python, data preparation, machine learning modeling, and retrieval-augmented generation (RAG), all leading up to AI Agents. Each course features focused, interactive sessions with hands-on notebooks and exercises, along with dedicated office hours. Learn more about the AI Mini Bootcamphere.

Machine Learning

Machine Learning Machine Learning Data Preparation Artificial Intelligence

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

AWS Machine Learning Blog

AUGUST 14, 2023

Often, to get an NLP application working for production use cases, we end up having to think about data preparation and cleaning. This is covered with Haystack indexing pipelines , which allows you to design your own data preparation steps, which ultimately write your documents to the database of your choice.

AWS

AWS Database AI AI

Snorkel Flow 2023.R3 release: PaLM integration, streamlined onboarding, and enhanced user experience

Snorkel AI

NOVEMBER 1, 2023

When Vertex Model Monitoring detects data drift, input feature values are submitted to Snorkel Flow, enabling ML teams to adapt labeling functions quickly, retrain the model, and then deploy the new model with Vertex AI. Book a demo today. Revamped Snorkel Flow SDK Also included in the 2023.R3 See what Snorkel option is right for you.

Data Scientist

Data Scientist ML ML Data Preparation

15 Fan-Favorite Speakers & Instructors Returning for ODSC East 2025

ODSC - Open Data Science

MARCH 18, 2025

This session covers key CV concepts, real-world use cases, and step-by-step guidance on data preparation, model selection, and fine-tuning. This session debunks common misconceptions, covering key topics like proper data types, chaining, aggregation, and debugging.

Data Science

Data Science Machine Learning Machine Learning Data Scientist

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Market participants who are receiving either live or historical data feeds need to ingest this data and perform one or more steps, such as parse the message out of a binary protocol, rebuild the limit order book (LOB), or combine multiple feeds into a single normalized format.

AWS

AWS ML ML Clustering

Agentic AI and AI‑ready data: Transforming consumer‑facing applications

Dataconomy

MAY 14, 2025

From ordering groceries to booking travel, consumers will increasingly rely on AI agents to handle interactions that once required direct human effort. In short, its analytics-grade data prepared for AI. The good news is that the playing field is still relatively open.

AI

AI AI Data Warehouse Data Pipeline

Transcribe and generate subtitles for YouTube videos with Node.js

AssemblyAI

JUNE 24, 2024

Do Kaggle's intro and intermediate ML courses to learn more data preparation with Pandas. Useful books referenced: Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow, Machine Learning Yearning by Andrew Ng. . - Implement some algorithms from scratch in Python to better understand concepts.

Machine Learning

Machine Learning Machine Learning Python ML

Build well-architected IDP solutions with a custom lens – Part 2: Security

AWS Machine Learning Blog

NOVEMBER 22, 2023

Only involving necessary people to do case validation or augmentation tasks reduces the risk of document mishandling and human error when dealing with sensitive data. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. Suyin Wang is an AI/ML Specialist Solutions Architect at AWS.

AWS

AWS ML ML Machine Learning

Transcribe and generate subtitles for YouTube videos with Node.js

AssemblyAI

JUNE 24, 2024

Do Kaggle's intro and intermediate ML courses to learn more data preparation with Pandas. Useful books referenced: Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow, Machine Learning Yearning by Andrew Ng. . - Implement some algorithms from scratch in Python to better understand concepts.

Machine Learning

Machine Learning Machine Learning Python ML

30 Best Data Science Books to Read in 2023

Data Preparation for Analysis : Towards Creating your Tableau Dashboard?—?Part 1

Webinars

Trending Sources

Why There’s No Better Time to Learn LLM Development

Webinars

5 Top Large Language Models & Generative AI Books

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

4 Ways to Handle Insufficient Data In Machine Learning!

Data science revolution 101 – Unleashing the power of data in the digital age

Implementing Approximate Nearest Neighbor Search with KD-Trees

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Introducing our New Book: Implementing MLOps in the Enterprise

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

Streamline RAG applications with intelligent metadata filtering using Amazon Bedrock

10 Best Data Engineering Books [Beginners to Advanced]

Unlock proprietary data with Snorkel Flow and Amazon SageMaker

Best practices for Meta Llama 3.2 multimodal fine-tuning on Amazon Bedrock

Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

Experience the new and improved Amazon SageMaker Studio

Multimodality in LLMs: Understanding its Power and Impact

On the implementation of digital tools

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

Introduction to Power BI Datamarts

#54 Things are never boring with RAG! Vector Store, Vector Search, Knowledge Base, and more!

Supervised vs Unsupervised Learning: Key Differences

Building a RAG chatbot with LangChain, Chroma, Hugging Face, and Arcee Conductor

How Fastweb fine-tuned the Mistral model using Amazon SageMaker HyperPod as a first step to build an Italian large language model

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

Approximate Nearest Neighbor with Locality Sensitive Hashing (LSH)

Exploring data using AI chat at Domo with Amazon Bedrock

Snorkel Flow 2023.R3 release: PaLM integration, streamlined onboarding, and enhanced user experience

From First Principles: Building Function Calling by Fine-tuning NanoGPT

Predictive Maintenance Using Isolation Forest

Serverless Machine Learning in AWS: Lambda + Step Functions Guide

Multimodality in LLMs: Understanding its Power and Impact

Your guide to generative AI and ML at AWS re:Invent 2023

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

Everything New Coming to ODSC East 2025

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Snorkel Flow 2023.R3 release: PaLM integration, streamlined onboarding, and enhanced user experience

15 Fan-Favorite Speakers & Instructors Returning for ODSC East 2025

A review of purpose-built accelerators for financial services

Agentic AI and AI‑ready data: Transforming consumer‑facing applications

Transcribe and generate subtitles for YouTube videos with Node.js

Build well-architected IDP solutions with a custom lens – Part 2: Security

Transcribe and generate subtitles for YouTube videos with Node.js

Stay Connected