Artificial Intelligence, Data Preparation and Python

30 Best Data Science Books to Read in 2023

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations.

Data Science

Data Science Data Preparation Big Data Big Data

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Data Science Dojo

MARCH 7, 2023

However, certain technical skills are considered essential for a data scientist to possess. These skills include programming languages such as Python and R, statistics and probability, machine learning, data visualization, and data modeling.

Data Scientist

Data Scientist Exploratory Data Analysis Data Science Data Visualization

Fine-tuning large language models (LLMs) for 2025

Dataconomy

NOVEMBER 11, 2024

Data preparation for LLM fine-tuning Proper data preparation is key to achieving high-quality results when fine-tuning LLMs for specific purposes. Importance of quality data in fine-tuning Data quality is paramount in the fine-tuning process.

Data Preparation

Data Preparation Database Data Quality Machine Learning

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Artificial Intelligence Using Python: A Comprehensive Guide

Pickl AI

JULY 12, 2024

Summary: This guide explores Artificial Intelligence Using Python, from essential libraries like NumPy and Pandas to advanced techniques in machine learning and deep learning. It equips you to build and deploy intelligent systems confidently and efficiently.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Python Natural Language Processing

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. The fundamental objective is to build a manufacturer-agnostic database, leveraging generative AI’s ability to standardize sensor outputs, synchronize data, and facilitate precise corrections.

AWS

AWS Python AI AI

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Flipboard

MARCH 7, 2023

This post presents and compares options and recommended practices on how to manage Python packages and virtual environments in Amazon SageMaker Studio notebooks. Studio provides all the tools you need to take your models from data preparation to experimentation to production while boosting your productivity. Define a Dockerfile.

Python

Python AWS ML ML

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

NOVEMBER 27, 2023

Generative artificial intelligence ( generative AI ) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. This will land on a data flow page.

Data Preparation

Data Preparation AI AI Python

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. Data preparation SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images.

AWS

AWS Data Lakes ML ML

6 AI tools revolutionizing data analysis: Unleashing the best in business

Data Science Dojo

JULY 17, 2023

These methods can help businesses to make sense of their data and to identify trends and patterns that would otherwise be invisible. In recent years, there has been a growing interest in the use of artificial intelligence (AI) for data analysis. It is similar to TensorFlow, but it is designed to be more Pythonic.

Data Analysis

Data Analysis Data Analysis Tableau Machine Learning

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning Blog

NOVEMBER 19, 2024

This session covers the technical process, from data preparation to model customization techniques, training strategies, deployment considerations, and post-customization evaluation. Explore how this powerful tool streamlines the entire ML lifecycle, from data preparation to model deployment.

AWS

AWS ML ML AI

State of Machine Learning Survey Results Part Two

ODSC - Open Data Science

MARCH 13, 2023

Machine learning practitioners are often working with data at the beginning and during the full stack of things, so they see a lot of workflow/pipeline development, data wrangling, and data preparation.

Machine Learning

Machine Learning Machine Learning Data Wrangling Data Science

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

AWS Machine Learning Blog

DECEMBER 24, 2024

Fine tuning Now that your SageMaker HyperPod cluster is deployed, you can start preparing to execute your fine tuning job. Data preparation The foundation of successful language model fine tuning lies in properly structured and prepared training data. The following is the Python code for the get_model.py

AWS

AWS Clustering Deep Learning Deep Learning

Identifying Nigerian Traditional Textiles using Artificial Intelligence on Android Devices ( Part 1…

Towards AI

JANUARY 31, 2023

Identifying Traditional Nigerian Textiles using Artificial Intelligence on Android Devices ( Part 1 ) Nigeria is a country blessed by God with 3 major ethnic groups( Yoruba, Hausa, and Ibo) and these different ethnic groups have their different cultural differences in terms of dressing, marriage, food, etc.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Machine Learning Machine Learning

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

IBM Data Science in Practice

JANUARY 9, 2024

By Carolyn Saplicki , IBM Data Scientist Industries are constantly seeking innovative solutions to maximize efficiency, minimize downtime, and reduce costs. One groundbreaking technology that has emerged as a game-changer is asset performance management (APM) artificial intelligence (AI).

ML

ML ML AI AI

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

We create a custom training container that downloads data directly from the Snowflake table into the training instance rather than first downloading the data into an S3 bucket. 1 with the following additions: The Snowflake Connector for Python to download the data from the Snowflake table to the training instance.

ML

ML ML AWS Python

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

AWS Machine Learning Blog

APRIL 26, 2024

FM-powered artificial intelligence (AI) assistants have limitations, such as providing outdated information or struggling with context outside their training data. This feature empowers you to rapidly synthesize this information without the hassle of data preparation or any management overhead.

AWS

AWS Database Python AI

How are AI Projects Different

Towards AI

AUGUST 16, 2023

Michael Dziedzic on Unsplash I am often asked by prospective clients to explain the artificial intelligence (AI) software process, and I have recently been asked by managers with extensive software development and data science experience who wanted to implement MLOps. Norvig, Artificial Intelligence: A Modern Approach, 4th ed.

Machine Learning

Machine Learning Machine Learning AI AI

Machine Learning with MATLAB and Amazon SageMaker

Flipboard

NOVEMBER 21, 2023

MATLAB   is a popular programming tool for a wide range of applications, such as data processing, parallel computing, automation, simulation, machine learning, and artificial intelligence. Verify your python3 installation by running python -V or python --version command on your terminal.

Machine Learning

Machine Learning Machine Learning AWS Decision Trees

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

AWS Machine Learning Blog

AUGUST 1, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. See the following notebook for the complete code sample.

SQL

SQL AWS ML ML

GraphReduce: Using Graphs for Feature Engineering Abstractions

ODSC - Open Data Science

SEPTEMBER 25, 2023

Tapping into these schemas and pulling out machine learning-ready features can be nontrivial as one needs to know where the data entity of interest lives (e.g., customers), what its relations are, and how they’re connected, and then write SQL, python, or other to join and aggregate to a granularity of interest.

Data Preparation

Data Preparation Machine Learning Machine Learning ML

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

AWS Machine Learning Blog

JULY 11, 2024

It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Specifically, we demonstrate how you can customize SageMaker Distribution for geospatial workflows by extending it with open-source geospatial Python libraries.

AWS

AWS ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

SageMaker Studio allows data scientists, ML engineers, and data engineers to prepare data, build, train, and deploy ML models on one web interface. Finally, we deploy the ONNX model along with a custom inference code written in Python to Azure Functions using the Azure CLI. image and Python 3.0

ML

ML ML Azure AWS

Top 10 Machine Learning (ML) Tools for Developers in 2023

Towards AI

JUNE 27, 2023

In the rapidly expanding field of artificial intelligence (AI), machine learning tools play an instrumental role. Its seamless integration capabilities make it highly compatible with numerous other Python libraries, which is why Scikit Learn is favored by many in the field for tackling sophisticated machine learning problems.

Machine Learning

Machine Learning Machine Learning ML ML

LAI #71: Open-Sora: $200K Video Model, HPC’s Unsung Hero, and 10 Ways LLMs Fail in the Wild

Towards AI

APRIL 17, 2025

In this piece, we explore practical ways to define data standards, ethically scrape and clean your datasets, and cut out the noise whether youre pretraining from scratch or fine-tuning a base model. Nericarcasci is working on LEO, a Python-based tool that acts like a conductor for AI. 👉 Read the post here!

AI

AI AI Data Preparation Deep Learning

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning Blog

JULY 29, 2024

With the addition of forecasting, you can now access end-to-end ML capabilities for a broad set of model types—including regression, multi-class classification, computer vision (CV), natural language processing (NLP), and generative artificial intelligence (AI)—within the unified user-friendly platform of SageMaker Canvas.

ML

ML ML Algorithm AWS

GenASL: Generative AI-powered American Sign Language avatars

AWS Machine Learning Blog

AUGUST 26, 2024

GenASL is a generative artificial intelligence (AI) -powered solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language. This instance will be used for various tasks such as video processing and data preparation.

AWS

AWS AI AI ML

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

AWS Machine Learning Blog

JULY 11, 2024

Fine tuning embedding models using SageMaker SageMaker is a fully managed machine learning service that simplifies the entire machine learning workflow, from data preparation and model training to deployment and monitoring. Python script that serves as the entry point. client('s3') # Get the region name session = boto3.Session()

AWS

AWS ML ML Machine Learning

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. Airflow for workflow orchestration Airflow schedules and manages complex workflows, defining tasks and dependencies in Python code.

AWS

AWS Machine Learning Machine Learning ML

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

AWS Machine Learning Blog

JUNE 23, 2023

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

ML

ML ML Database AWS

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

AWS Machine Learning Blog

DECEMBER 13, 2023

We create an automated model build pipeline that includes steps for data preparation, model training, model evaluation, and registration of the trained model in the SageMaker Model Registry. About the Authors Dr. Romina Sharifpour is a Senior Machine Learning and Artificial Intelligence Solutions Architect at Amazon Web Services (AWS).

AWS

AWS ML ML Data Preparation

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

AWS Machine Learning Blog

JUNE 17, 2024

Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. The training data used for this pipeline is made available through PrestoDB and read into Pandas through the PrestoDB Python client.

ML

ML ML AWS Machine Learning

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

Towards AI

MAY 8, 2024

Created by the author with DALL E-3 Google Earth Engine for machine learning has just gotten a new face lift, with all the advancement that has been going on in the world of Artificial intelligence, Google Earth Engine was not going to be left behind as it is an important tool for spatial analysis.

Machine Learning

Machine Learning Machine Learning ML ML

Building Custom Text Classifiers with Mistral AI Classifier Factory: A Technical Guide

Towards AI

APRIL 21, 2025

It details the necessary setup, data preparation requirements, the step-by-step fine-tuning workflow, methods for leveraging the resulting custom models, and illustrative examples of potential use cases. For Python development, the official mistralai library needs to be installed.

AI

AI AI Data Preparation Python

How Data Science and AI is Changing the Future

Pickl AI

NOVEMBER 5, 2024

Introduction Data Science and Artificial Intelligence (AI) are at the forefront of technological innovation, fundamentally transforming industries and everyday life. Enhanced data visualisation aids in better communication of insights. Domain knowledge is crucial for effective data application in industries.

Data Science

Data Science Artificial Intelligence Artificial Intelligence Machine Learning

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

AWS Machine Learning Blog

NOVEMBER 29, 2023

You can use this notebook job step to easily run notebooks as jobs with just a few lines of code using the Amazon SageMaker Python SDK. Data scientists currently use SageMaker Studio to interactively develop their Jupyter notebooks and then use SageMaker notebook jobs to run these notebooks as scheduled jobs.

ML

ML ML Data Scientist Python

Build an email spam detector using Amazon SageMaker

AWS Machine Learning Blog

JULY 18, 2023

If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) Import the required Python library and set the roles and the S3 buckets. You now run the data preparation step in the notebook. In your Studio notebook, open the spam_detector.ipynb notebook. kernel and choose Select.

Supervised Learning

Supervised Learning Algorithm Natural Language Processing AWS

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

AWS Machine Learning Blog

SEPTEMBER 3, 2024

Harnessing the power of big data has become increasingly critical for businesses looking to gain a competitive edge. From deriving insights to powering generative artificial intelligence (AI) -driven applications, the ability to efficiently process and analyze large datasets is a vital capability.

AWS

AWS Clustering Big Data Big Data

Life of modern-day alchemists: What does a data scientist do?

Dataconomy

AUGUST 16, 2023

” The answer: they craft predictive models that illuminate the future ( Image credit ) Data collection and cleaning : Data scientists kick off their journey by embarking on a digital excavation, unearthing raw data from the digital landscape. Interprets data to uncover actionable insights guiding business decisions.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Becoming Human

MAY 15, 2023

One is a scripting language such as Python, and the other is a Query language like SQL (Structured Query Language) for SQL Databases. Python is a High-level, Procedural, and object-oriented language; it is also a vast language itself, and covering the whole of Python is one the worst mistakes we can make in the data science journey.

Data Science

Data Science Machine Learning Machine Learning Database

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring.

AWS

AWS ML ML Machine Learning

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

AWS Machine Learning Blog

MAY 25, 2023

The container image contains codes to invoke the SageMaker Serverless Inference endpoints, and necessary Python libraries to run the Lambda function such as NumPy, pandas, and scikit-learn. For more information, refer to Granting Data Catalog permissions using the named resource method. We have completed the data preparation step.

ML

ML ML AWS Database

Scale training and inference of thousands of ML models with Amazon SageMaker

AWS Machine Learning Blog

AUGUST 3, 2023

Solution overview To efficiently train and serve thousands of ML models, we can use the following SageMaker features: SageMaker Processing – SageMaker Processing is a fully managed data preparation service that enables you to perform data processing and model evaluation tasks on your input data.

ML

ML ML AWS Python

30 Best Data Science Books to Read in 2023

Empower your career – Discover the 10 essential skills to excel as a data scientist in 2023

Webinars

Trending Sources

Fine-tuning large language models (LLMs) for 2025

Webinars

Artificial Intelligence Using Python: A Comprehensive Guide

Improving air quality with generative AI

Four approaches to manage Python packages in Amazon SageMaker Studio notebooks

Simplify data prep for generative AI with Amazon SageMaker Data Wrangler

How Northpower used computer vision with AWS to automate safety inspection risk assessments

6 AI tools revolutionizing data analysis: Unleashing the best in business

Your guide to generative AI and ML at AWS re:Invent 2024

State of Machine Learning Survey Results Part Two

PEFT fine tuning of Llama 3 on SageMaker HyperPod with AWS Trainium

Identifying Nigerian Traditional Textiles using Artificial Intelligence on Android Devices ( Part 1…

MAS AI/ML Modernization Accelerator: Air Compressor Use Case

Use Snowflake as a data source to train ML models with Amazon SageMaker

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

How are AI Projects Different

Machine Learning with MATLAB and Amazon SageMaker

Import a fine-tuned Meta Llama 3 model for SQL query generation on Amazon Bedrock

GraphReduce: Using Graphs for Feature Engineering Abstractions

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Top 10 Machine Learning (ML) Tools for Developers in 2023

LAI #71: Open-Sora: $200K Video Model, HPC’s Unsung Hero, and 10 Ways LLMs Fail in the Wild

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

GenASL: Generative AI-powered American Sign Language avatars

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.

Building Custom Text Classifiers with Mistral AI Classifier Factory: A Technical Guide

How Data Science and AI is Changing the Future

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Build an email spam detector using Amazon SageMaker

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Life of modern-day alchemists: What does a data scientist do?

Roadmap to Learn Data Science for Beginners and Freshers in 2023

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Get insights on your user’s search behavior from Amazon Kendra using an ML-powered serverless stack

Scale training and inference of thousands of ML models with Amazon SageMaker

Stay Connected