ML @ CMU

article thumbnail

Beyond the Mud: Datasets, Benchmarks, and Methods for Computer Vision in Off-Road Racing

ML @ CMU

TL;DR : Off-the-shelf text spotting and re-identification models fail in basic off-road racing settings, even more so during muddy events. Making matters worse, there aren’t any public datasets to evaluate or improve models in this domain. To this end, we introduce datasets, benchmarks, and methods for the challenging off-road racing setting. In the dynamic world of sports analytics, machine learning (ML) systems play a pivotal role, transforming vast arrays of visual data into actionable

article thumbnail

Beyond the Mud: Datasets, Benchmarks, and Methods for Computer Vision in Off-Road Racing

ML @ CMU

TL;DR : Off-the-shelf text spotting and re-identification models fail in basic off-road racing settings, even more so during muddy events. Making matters worse, there aren’t any public datasets to tune or improve models in this domain. To this end, we introduce datasets, benchmarks, and methods for the challenging off-road racing setting. In the dynamic world of sports analytics, machine learning (ML) systems play a pivotal role, transforming vast arrays of visual data into actionable insi

professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

NLPositionality: Characterizing Design Biases of Datasets and Models

ML @ CMU

TLDR; Design biases in NLP systems, such as performance differences for different populations, often stem from their creator’s positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases, they are hard to quantify because researcher, system, and dataset positionality are often unobserved.

article thumbnail

On Noisy Evaluation in Federated Hyperparameter Tuning

ML @ CMU

Evaluating models in federated networks is challenging due to factors such as client subsampling, data heterogeneity, and privacy. These factors introduce noise that can affect hyperparameter tuning algorithms and lead to suboptimal model selection. Hyperparameter tuning is critical to the success of cross-device federated learning applications. Unfortunately, federated networks face issues of scale, heterogeneity, and privacy, which introduce noise in the tuning process and make it difficult to

Algorithm 236
article thumbnail

Creative Robot Tool Use with Large Language Models

ML @ CMU

TLDR: We introduce RoboTool , enabling robots to use tools creatively with large language models, which solves long-horizon hybrid discrete-continuous planning problems with the environment- and embodiment-related constraints. Tool use is an essential hallmark of advanced intelligence. Some animals can use tools to achieve goals that are infeasible without tools.

Python 310
article thumbnail

Peer Reviews of Peer Reviews: A Randomized Controlled Trial and Other Experiments

ML @ CMU

Alexander Goldberg , Ivan Stelmakh, Kyunghyun Cho, Alice Oh, Alekh Agarwal, Danielle Belgrave, and Nihar Shah Is it possible to reliably evaluate the quality of peer reviews? We study peer reviewing of peer reviews driven by two primary motivations: (i) Incentivizing reviewers to provide high-quality reviews is an important open problem. The ability to reliably assess the quality of reviews can help design such incentive mechanisms.

article thumbnail

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

ML @ CMU

Illustration depicting the process of a human and a large language model working together to find failure cases in a (not necessarily different) large language model. Overview In the era of ChatGPT, where people increasingly take assistance from a large language model (LLM) in day-to-day tasks, rigorously auditing these models is of utmost importance.

AI 281