Blog - Data Science Current

Comprehensive Guide: Top Computer Vision Resources All in One Blog

Mlearning.ai

JANUARY 27, 2023

Save this blog for comprehensive resources for computer vision Source: appen Working in computer vision and deep learning is fantastic because, after every few months, someone comes up with something crazy that completely changes your perspective on what is feasible. A dataset is a group of samples (in this case, photos or videos).

Deep Learning

Deep Learning Deep Learning Python Data Scientist

Introducing Snorkel’s Foundation Model Data Platform

Snorkel AI

JUNE 12, 2023

In 2007, Google researchers published a paper on a class of statistical language models they dubbed “large language models”, which they reported as achieving a new state of the art in performance. They used a very standard model and a decoding algorithm so simple they named it “Stupid Backoff” 1. The key differentiator?

AI

AI AI Algorithm Azure

Introducing Snorkel’s Foundation Model Data Platform

Snorkel AI

JUNE 12, 2023

In 2007, Google researchers published a paper on a class of statistical language models they dubbed “large language models”, which they reported as achieving a new state of the art in performance. They used a very standard model and a decoding algorithm so simple they named it “Stupid Backoff” 1. The key differentiator?

AI

AI AI Algorithm Azure

Webinars

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

How To Align Product Management And Supply Chain Operations For Successful Product Launches

Improving the Accuracy of Generative AI Systems: A Structured Approach

Changing the Game with MES: Cut Costs, Drive Efficiency, & Achieve Sustainability Goals!

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

MORE WEBINARS

Efficient continual pre-training LLMs for financial domains

AWS Machine Learning Blog

MARCH 28, 2024

Large language models (LLMs) are generally trained on large publicly available datasets that are domain agnostic. For example, Meta’s Llama models are trained on datasets such as CommonCrawl , C4 , Wikipedia, and ArXiv. These datasets encompass a broad range of topics and domains.

AWS

AWS Machine Learning Machine Learning Data Quality

Revolutionize LLM with Llama 2 fine-tuning

Data Science Dojo

OCTOBER 1, 2023

With the introduction of LLaMA v1, we witnessed a surge in customized models like Alpaca , Vicuna , and WizardLM. This surge motivated various businesses to launch their own foundational models, such as OpenLLaMA , Falcon , and XGen , with licenses suitable for commercial purposes.

Machine Learning

Machine Learning Machine Learning AI AI

Create high-quality datasets with Amazon SageMaker Ground Truth and FiftyOne

AWS Machine Learning Blog

MAY 5, 2023

Voxel51 is the company behind FiftyOne, the open-source toolkit for building high-quality datasets and computer vision models. To create this app, they need a high-quality dataset containing clothing images, labeled with different categories. You want to make things as easy as possible for the end-user.

Machine Learning

Machine Learning Machine Learning AWS ML

Conformer-1: A robust speech recognition model trained on 650K hours of data

AssemblyAI

MARCH 15, 2023

1 – Efficient Conformer encoder model architecture. In an effort to further improve our model’s accuracy on noisy audio , we implemented a modified version of Sparse Attention [ 5 ], a pruning method for achieving sparsity of the model’s weights in order to achieve regularization.

Create a data labeling project with Amazon SageMaker Ground Truth Plus

AWS Machine Learning Blog

OCTOBER 15, 2024

Amazon SageMaker Ground Truth is a powerful data labeling service offered by AWS that provides a comprehensive and scalable platform for labeling various types of data, including text, images, videos, and 3D point clouds, using a diverse workforce of human annotators. Each batch is made up of data objects to be labeled.

AWS

AWS ML ML Machine Learning

How to Mask PII Before LLM Training

Iguazio

SEPTEMBER 26, 2023

In this post, we share the open source solution that can help identify and mask PII information: The PII Recognizer. LLMs are trained on large datasets of text and code. If this data contains PII, it becomes part of the models’ training dataset. Presidio is used as a small model registry.

AI

AI AI

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. However, bringing such solutions and models to the business-as-usual operations is not an easy task. Our approach applies to both open-source and proprietary models equally.

AI

AI AI ML ML

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Journey to AI blog

MAY 9, 2023

Data must be laboriously collected, curated, and labeled with task-specific annotations to train AI models. Building a model requires specialized, hard-to-find skills — and each new task requires repeating the process. ” These large models have lowered the cost and labor involved in automation.

AI

AI AI Data Quality Data Lakes

Automate Amazon Rekognition Custom Labels model training and deployment using AWS Step Functions

AWS Machine Learning Blog

MARCH 22, 2023

With Amazon Rekognition Custom Labels , you can have Amazon Rekognition train a custom model for object detection or image classification specific to your business needs. Additionally, it often requires thousands or tens of thousands of hand-labeled images to provide the model with enough data to accurately make decisions.

AWS

AWS Machine Learning Machine Learning ML

Improving your LLMs with RLHF on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 22, 2023

Reinforcement Learning from Human Feedback (RLHF) is recognized as the industry standard technique for ensuring large language models (LLMs) produce content that is truthful, harmless, and helpful. Gone are the days when you need unnatural prompt engineering to get base models, such as GPT-3, to solve your tasks. a written email).

Machine Learning

Machine Learning Machine Learning AWS Computer Science

IBM watsonx.ai: Open source, pre-trained foundation models make AI and automation easier than ever before

IBM Journey to AI blog

JUNE 14, 2023

And then you need highly specialized, expensive and difficult to find skills to work the magic of training an AI model. But that’s all changing thanks to pre-trained, open source foundation models. And those massive large-scale datasets contain some of the darker corners of the internet.

AI

AI AI Natural Language Processing Data Lakes

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Google Research AI blog

MARCH 30, 2023

The key to both is a deeper understanding of ML data — how to engineer training datasets that produce high quality models and test datasets that deliver accurate indicators of how close we are to solving the target problem. Despite the importance of data, ML research to date has been dominated by a focus on models.

ML

ML ML Algorithm Data Quality

Generative AI that’s tailored for your business needs with watsonx.ai

IBM Journey to AI blog

SEPTEMBER 28, 2023

An AI and data platform, such as watsonx, can help empower businesses to leverage foundation models and accelerate the pace of generative AI adoption across their organization. These enhancements have been guided by IBM’s fundamental strategic considerations that AI should be open, trusted, targeted and empowering.

AI

AI AI Algorithm Artificial Intelligence

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

This post details how Purina used Amazon Rekognition Custom Labels , AWS Step Functions , and other AWS Services to create an ML model that detects the pet breed from an uploaded image and then uses the prediction to auto-populate the pet attributes. Solution overview Predicting animal breeds from an image needs custom ML models.

AWS

AWS ML ML Machine Learning

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. For more information, refer to Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data.

AWS

AWS ML ML Data Pipeline

Automate PDF pre-labeling for Amazon Comprehend

AWS Machine Learning Blog

DECEMBER 14, 2023

Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business. To train a custom model, you first prepare training data by manually annotating entities in documents. The first technique is fuzzy matching.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

23 Best Free NLP Datasets for Machine Learning

Iguazio

SEPTEMBER 20, 2023

To help with these efforts, we’ve compiled a list of the top NLP datasets for NLP projects that data scientists and data professionals can use for training their models. This list is a starting point for training your NLP models. Get the dataset here. Get the dataset here. Get the dataset here.

Machine Learning

Machine Learning Machine Learning Database Data Scientist

Unleashing the potential of GANs: a look at popular Open-Source GAN Models

Defined.ai blog

FEBRUARY 20, 2023

Generative adversarial networks , or GANs, are a type of machine learning algorithm that can be used to generate synthetic datasets. We’ll discuss some of the available open-source GANs below, including TensorFlow GAN, Pix2pix, and CycleGAN , which are available on popular code hosting platforms such as GitHub.

Machine Learning

Machine Learning Machine Learning Algorithm Database

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

Flipboard

JUNE 26, 2023

In today’s digital world, data is generated by a large number of disparate sources and growing at an exponential rate. Typically, companies ingest data from multiple sources into their data lake to derive valuable insights from the data. It’s commonly referred to as a data harmonization or deduplication problem.

AWS

AWS ML ML ETL

ODSC’s AI Weekly Recap: Week of March 8th

ODSC - Open Data Science

MARCH 8, 2024

Source ) In a blog post released today, OpenAI fired back at Elon Musk’s lawsuit and moved to dismiss his claims about the company’s motives. Source ) According to a report, Apple is hoping to push forward its efforts in generative AI in a bid to catch up with competitor Microsoft. MetaVoice-1B is a 1.2B

Artificial Intelligence

Artificial Intelligence Artificial Intelligence AI AI

The Pros and Cons of using free datasets for Aspect-Based Sentiment Analysis

Defined.ai blog

MARCH 23, 2023

Once the sentiments for each aspect have been classified, it’s important to evaluate the accuracy of the results by comparing the predicted sentiments with human-generated labels, or by using other evaluation metrics such as precision, recall, and F1 score. More info about the SemEval datasets can be found here. Evaluating the results.

Natural Language Processing

Natural Language Processing Algorithm Artificial Intelligence Artificial Intelligence

Gen AI 101: Prompt Engineering (Part 3)

phData

AUGUST 13, 2024

Anyone who has interacted with ChatGPT has most likely performed prompt engineering , tweaking the question (prompt) sent to the model to return a quirky, funny, or more accurate response. Additionally, they had the added challenge of selecting a large language model that best fits the application’s use case.

AI

AI AI Data Science Algorithm

Crossing the demo-to-production chasm with Snorkel Custom

Snorkel AI

APRIL 11, 2024

Instead, LLMs have to be tuned for enterprises’ unique use cases–and success here is all about the quality of the labeled, curated data this relies on. Today, we help some of the world’s most sophisticated enterprises label and develop their data for tuning LLMs with our flagship platform, Snorkel Flow.

AI

AI AI

Foundational vision models and visual prompt engineering for autonomous driving applications

AWS Machine Learning Blog

NOVEMBER 15, 2023

Prompt engineering has become an essential skill for anyone working with large language models (LLMs) to generate high-quality and relevant texts. Visual prompts can include bounding boxes or masks that guide vision models in generating relevant and accurate outputs. It is pre-trained on a massive dataset of 11 million images and 1.1

Machine Learning

Machine Learning Machine Learning ML ML

Overcoming 12 Challenges in Building Production-Ready RAG-based LLM Applications

Data Science Dojo

MARCH 29, 2024

Large Language Models are growing smarter, transforming how we interact with technology. Understanding RAG RAG is a framework that retrieves data from external sources and incorporates it into the LLM’s decision-making process. This allows the model to access real-time information and address knowledge gaps.

Database

Database Clustering SQL Machine Learning

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

AWS Machine Learning Blog

FEBRUARY 23, 2023

End-to-end training – This approach involves training a DNN model that takes raw sensor data as input and outputs the driving command. Automation levels The SAE International (formerly called as Society of Automotive Engineers) J3016 standard defines six levels of driving automation, and is the most cited source for driving automation.

AWS

AWS ML ML Machine Learning

Generative AI Terminology — An Evolving Taxonomy To Get You Started

Towards AI

JANUARY 30, 2024

The 12 groups are as follows — Types of Models Common LLM Terms LLM Lifecycle Stages LLM Evaluations LLM Architecture Retrieval Augmented Generation (RAG) LLM Agents LMM Architecture Cost & Efficiency LLM Security Deployment & Inference A list of providers supporting LLMOps Like the generative AI space, this taxonomy is also evolving.

AI

AI AI Natural Language Processing Supervised Learning

Meet the winners of the Tick Tick Bloom: Harmful Algal Bloom Detection Challenge

DrivenData Labs

APRIL 13, 2023

Image source: NASA Landsat Image Gallery While there are established methods for using satellite imagery to detect cyanobacteria in larger water bodies like oceans, detection in small inland lakes and reservoirs remains a challenge. Labels were based on "in situ" samples that were collected manually by many organizations across the U.S.

Data Scientist

Data Scientist Decision Trees Algorithm Data Quality

Use machine learning to detect anomalies and predict downtime with Amazon Timestream and Amazon Lookout for Equipment

AWS Machine Learning Blog

DECEMBER 29, 2022

Without access to data scientists for model training, or ML specialists to deploy solutions at the local level, adoption has seemed out of reach for teams on the factory floor. To get started, we first collect a historical dataset from your factory sensor readings, ingest the data, and train the model. Solution overview.

Machine Learning

Machine Learning Machine Learning AWS Database

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. Foundation models: The power of curated datasets Foundation models , also known as “transformers,” are modern, large-scale AI models trained on large amounts of raw, unlabeled data.

AI

AI AI Data Warehouse ML

Train and deploy ML models in a multicloud environment using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 20, 2023

In these scenarios, as you start to embrace generative AI, large language models (LLMs) and machine learning (ML) technologies as a core part of your business, you may be looking for options to take advantage of AWS AI and ML capabilities outside of AWS in a multicloud environment.

ML

ML ML Azure AWS

Tutorial: Build an Active Learning Pipeline using Data Engine

DagsHub

AUGUST 15, 2023

Most of the time, tooling for an active learning pipeline needs to be either custom written, or cobbled together from several different open source tools. In this tutorial, we will learn about Data Engine and see how we can use it to create an active learning pipeline for an image segmentation model using the COCO 1K.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

This AI newsletter is all you need #61

Towards AI

AUGUST 22, 2023

What happened this week in AI by Louie In recent months we have continued to see large language model (LLM) advancements and a gradual introduction of novel techniques but we haven’t yet seen competition directly aiming to displace GPT-4 as the most advanced (and training compute-intensive) model.

AI

AI AI Azure ML

Customized model monitoring for near real-time batch inference with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 28, 2024

Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. SageMaker Model Monitor monitors the quality of SageMaker ML models in production.

ML

ML ML AWS AI

Visual captions: Using large language models to augment video conferences with dynamic visuals

Google Research AI blog

JUNE 6, 2023

We fine-tuned a large language model to proactively suggest relevant visuals in open-vocabulary conversations using a dataset we curated for this purpose. We open sourced Visual Captions as part of the ARChat project, which is designed for rapid prototyping of augmented communication with real-time transcription.

Deep Learning

Deep Learning Deep Learning

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

AWS Machine Learning Blog

JUNE 21, 2024

eSentire’s AI Investigator enables users to complete complex queries using natural language by joining multiple sources of data from each customer’s own security telemetry and eSentire’s asset, vulnerability, and threat data mesh. Therefore, eSentire decided to build their own LLM using Llama 1 and Llama 2 foundational models.

AWS

AWS AI AI Natural Language Processing

Graph Convolutional Networks for NLP Using Comet

Heartbeat

JUNE 6, 2023

We will use the Cora dataset , which consists of academic papers and their classification labels. Real-time model analysis allows your team to track, monitor, and adjust models already in production. Load and Preprocess Cora Dataset Next, we would load the dataset for our text classification project.

Natural Language Processing

Natural Language Processing Deep Learning Deep Learning Machine Learning

We employed ChatGPT as an ML Engineer. This is what we learned

Towards AI

FEBRUARY 21, 2023

While many see ChatGPT as a leap forward technologically, its true spark wasn’t based on a dramatic technological step change (GPT3, the model it is based on has been around for almost 3 years), but instead on the fact that it was an AI application perfectly calibrated towards individual human interactions.

ML

ML ML Machine Learning Machine Learning

End-to-End Deep Learning Project with PyTorch & Comet ML

Heartbeat

MARCH 28, 2023

Let’s start with PyTorch: PyTorch PyTorch Features ( Image Source ) Two frameworks are generally used for deep learning: TensorFlow and PyTorch. Comet ML is an machine learning platform that allows you to manage, visualize, compare and optimize models. We’re going to use Comet ML to track our hyperparameters and to monitor our model.

Deep Learning

Deep Learning Deep Learning ML ML

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

Mlearning.ai

APRIL 6, 2023

Automate and streamline our ML inference pipeline with SageMaker and Airflow Building an inference data pipeline on large datasets is a challenge many companies face. SageMaker Batch Job Allows you to run batch inference on large datasets and generate predictions in a batch mode using machine learning (ML) models hosted in SageMaker.

Data Pipeline

Data Pipeline ML ML AWS

What Is a Transformer Model?

Hacker News

MARCH 25, 2022

So, What’s a Transformer Model? A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date.

Machine Learning

Machine Learning Machine Learning AI AI

Comprehensive Guide: Top Computer Vision Resources All in One Blog

Introducing Snorkel’s Foundation Model Data Platform

Webinars

Trending Sources

Introducing Snorkel’s Foundation Model Data Platform

Webinars

Efficient continual pre-training LLMs for financial domains

Revolutionize LLM with Llama 2 fine-tuning

Create high-quality datasets with Amazon SageMaker Ground Truth and FiftyOne

Conformer-1: A robust speech recognition model trained on 650K hours of data

Create a data labeling project with Amazon SageMaker Ground Truth Plus

How to Mask PII Before LLM Training

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

Automate Amazon Rekognition Custom Labels model training and deployment using AWS Step Functions

Improving your LLMs with RLHF on Amazon SageMaker

IBM watsonx.ai: Open source, pre-trained foundation models make AI and automation easier than ever before

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Generative AI that’s tailored for your business needs with watsonx.ai

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

Automate PDF pre-labeling for Amazon Comprehend

23 Best Free NLP Datasets for Machine Learning

Unleashing the potential of GANs: a look at popular Open-Source GAN Models

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

ODSC’s AI Weekly Recap: Week of March 8th

The Pros and Cons of using free datasets for Aspect-Based Sentiment Analysis

Gen AI 101: Prompt Engineering (Part 3)

Crossing the demo-to-production chasm with Snorkel Custom

Foundational vision models and visual prompt engineering for autonomous driving applications

Overcoming 12 Challenges in Building Production-Ready RAG-based LLM Applications

Modular functions design for Advanced Driver Assistance Systems (ADAS) on AWS

Generative AI Terminology — An Evolving Taxonomy To Get You Started

Meet the winners of the Tick Tick Bloom: Harmful Algal Bloom Detection Challenge

Use machine learning to detect anomalies and predict downtime with Amazon Timestream and Amazon Lookout for Equipment

How to use foundation models and trusted governance to manage AI workflow risk

Train and deploy ML models in a multicloud environment using Amazon SageMaker

Tutorial: Build an Active Learning Pipeline using Data Engine

This AI newsletter is all you need #61

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Visual captions: Using large language models to augment video conferences with dynamic visuals

eSentire delivers private and secure generative AI interactions to customers with Amazon SageMaker

Graph Convolutional Networks for NLP Using Comet

We employed ChatGPT as an ML Engineer. This is what we learned

End-to-End Deep Learning Project with PyTorch & Comet ML

Build an ML Inference Data Pipeline using SageMaker and Apache Airflow

What Is a Transformer Model?

Stay Connected