ML @ CMU

article thumbnail

RLHF 101: A Technical Tutorial on Reinforcement Learning from Human Feedback

ML @ CMU

Reinforcement Learning from Human Feedback (RLHF) is a popular technique used to align AI systems with human preferences by training them using feedback from people, rather than relying solely on predefined reward functions. Instead of coding every desirable behavior manually (which is often infeasible in complex tasks) RLHF allows models, especially large language models (LLMs), to learn from examples of what humans consider good or bad outputs.

Algorithm 154
article thumbnail

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning

ML @ CMU

Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. In this post, we will discuss our work (which appeared at ICLR 2025) demonstrating that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of benign relearning attacks : With access to only a small and potentially loosely related set of data, we find that we can jog the memory of unlearned models to reverse the effects of unlearning.

Algorithm 118
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Carnegie Mellon University at ICLR 2025

ML @ CMU

CMU researchers are presenting 143 papers at the Thirteenth International Conference on Learning Representations (ICLR 2025), held from April 24 – 28 at the Singapore EXPO. Here is a quick overview of the areas our researchers are working on: And here are our most frequent collaborator institutions: Table of Contents Oral Papers Spotlight Papers Poster Papers Alignment, Fairness, Safety, Privacy, And Societal Considerations Applications to Computer Vision, Audio, Language, And Other Modali

Algorithm 170
article thumbnail

Allie: A Human-Aligned Chess Bot

ML @ CMU

Play against Allie on lichess ! Introduction In 1948, Alan Turning designed what might be the first chess playing AI , a paper program that Turing himself acted as the computer for. Since then, chess has been a testbed for nearly every generation of AI advancement. After decades of improvement, today’s top chess engines like Stockfish and AlphaZero have far surpassed the capabilities of even the strongest human grandmasters.

AI 135
article thumbnail

LLM Unlearning Benchmarks are Weak Measures of Progress

ML @ CMU

TL;DR: “Machine unlearning” aims to remove data from models without retraining the model completely. Unfortunately, state-of-the-art benchmarks for evaluating unlearning in LLMs are flawed, especially because they separately test “forget queries” and “retain queries” without examining potential dependencies between forget and retain data.

Algorithm 170
article thumbnail

Copilot Arena: A Platform for Code

ML @ CMU

Figure 1. Copilot Arena is a VSCode extension that collects human preferences of code directly from developers. As model capabilities improve, large language models (LLMs) are increasingly integrated into user environments and workflows. In particular, software developers code with LLM-powered tools in integrated development environments such as VS Code, IntelliJ, or Eclipse.

Python 180
article thumbnail

Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem

ML @ CMU

Figure 1: Training models to optimize test-time compute and learn how to discover correct responses, as opposed to the traditional learning paradigm of learning what answer to output. The major strategy to improve large language models (LLMs) thus far has been to use more and more high-quality data for supervised fine-tuning (SFT) or reinforcement learning (RL).

Algorithm 189