By using this site, you agree to the Privacy Policy and Terms of Use.
Accept

News Junction

Notification Show More
Font ResizerAa
  • Home
  • World News
    World NewsShow More
    Investment banks lift China growth outlook after surprise trade deal with U.S.
    Investment banks lift China growth outlook after surprise trade deal with U.S.
    May 13, 2025
    First group of white South Africans lands in U.S. under Trump refugee plan
    First group of white South Africans lands in U.S. under Trump refugee plan
    May 13, 2025
    Seattle Storm set roster for 2025 season
    Seattle Storm set roster for 2025 season
    May 13, 2025
    Chiefs superfan Xaviar Babudar, or ‘Chiefsaholic,’ sentenced to more time in prison after bank robbery spree
    Chiefs superfan Xaviar Babudar, or ‘Chiefsaholic,’ sentenced to more time in prison after bank robbery spree
    May 13, 2025
    Where in the world are babies at the lowest risk of dying?
    Where in the world are babies at the lowest risk of dying?
    May 13, 2025
  • Business
    BusinessShow More
    Ukraine blows up bridges to consolidate its positions in Russia
    Ukraine blows up bridges to consolidate its positions in Russia
    August 18, 2024
    Commentary: AI phones from Google and Apple will erode trust in everything
    Commentary: AI phones from Google and Apple will erode trust in everything
    August 18, 2024
    The most famous Indian Dishes – Insights Success
    The most famous Indian Dishes – Insights Success
    August 18, 2024
    Life on the road as a female long rides cyclist
    Life on the road as a female long rides cyclist
    August 18, 2024
    UK inflation rises to 2.2%
    UK inflation rises to 2.2%
    August 18, 2024
  • Cryptocurrency
    CryptocurrencyShow More
    Coinbase (COIN) Shares Jump 8% Post-Market on S&P 500 Inclusion
    Coinbase (COIN) Shares Jump 8% Post-Market on S&P 500 Inclusion
    May 13, 2025
    Price predictions 5/12: SPX, DXY, BTC, ETH, XRP, BNB, SOL, DOGE, ADA, SUI
    Price predictions 5/12: SPX, DXY, BTC, ETH, XRP, BNB, SOL, DOGE, ADA, SUI
    May 13, 2025
    Bitcoin profit taking at 6K the first stop before new all-time BTC price highs
    Bitcoin profit taking at $106K the first stop before new all-time BTC price highs
    May 13, 2025
    Quantum computing a risk to Bitcoin network: Blackrock
    Quantum computing a risk to Bitcoin network: Blackrock
    May 13, 2025
    Will Bitcoin hodlers be the reason more countries adopt wealth taxes?
    Will Bitcoin hodlers be the reason more countries adopt wealth taxes?
    May 13, 2025
  • Technology
    TechnologyShow More
    How to Improve Your Spotify Recommendations
    How to Improve Your Spotify Recommendations
    August 18, 2024
    X says it’s closing operations in Brazil
    X says it’s closing operations in Brazil
    August 18, 2024
    Supermoon set to rise: Top tips for amateur photographers | Science & Tech News
    Supermoon set to rise: Top tips for amateur photographers | Science & Tech News
    August 18, 2024
    Scientists Want to See Videos of Your Cat for a New Study
    Scientists Want to See Videos of Your Cat for a New Study
    August 18, 2024
    OpenAI’s new voice mode let me talk with my phone, not to it
    OpenAI’s new voice mode let me talk with my phone, not to it
    August 18, 2024
  • Entertainment
  • Sports News
  • People
  • Trend
Reading: Making AI models ‘forget’ undesirable data hurts their performance
Share
Font ResizerAa

News Junction

  • World News
  • Business
  • Technology
  • Cryptocurrency
  • Trend
  • Entertainment
Search
  • Recent Headlines in Entertainment, World News, and Cryptocurrency – NewsJunction
  • World News
  • Business
  • Cryptocurrency
  • Technology
  • Entertainment
  • Sports News
  • People
  • Trend
Have an existing account? Sign In
Follow US
News Junction > Blog > Technology > Making AI models ‘forget’ undesirable data hurts their performance
Making AI models ‘forget’ undesirable data hurts their performance
Technology

Making AI models ‘forget’ undesirable data hurts their performance

Published July 30, 2024
Share
7 Min Read
SHARE

Contents
How models learnThe art of forgetting

So-called “unlearning” techniques are used to make a generative AI model forget specific and undesirable info it picked up from training data, like sensitive private data or copyrighted material.

But current unlearning techniques are a double-edged sword: They could make a model like OpenAI’s GPT-4o or Meta’s Llama 3.1 405B much less capable of answering basic questions.

That’s according to a new study co-authored by researchers at the University of Washington (UW), Princeton, the University of Chicago, USC and Google, which found that the most popular unlearning techniques today tend to degrade models — often to the point where they’re unusable.

“Our evaluation suggests that currently feasible unlearning methods are not yet ready for meaningful usage or deployment in real-world scenarios,” Weijia Shi, a researcher on the study and a Ph.D. candidate in computer science at UW, told TechCrunch. “Currently, there are no efficient methods that enable a model to forget specific data without considerable loss of utility.”

How models learn

Generative AI models have no real intelligence. They’re statistical systems that predict words, images, speech, music, videos and other data. Fed an enormous number of examples (e.g. movies, voice recordings, essays and so on), AI models learn how likely data is to occur based on patterns, including the context of any surrounding data.

Given an email ending in the fragment “Looking forward…”, for example, a model trained to autocomplete messages might suggest “… to hearing back,” following the pattern of all the emails it’s ingested. There’s no intentionality there; the model isn’t looking forward to anything. It’s simply making an informed guess.

Most models, including flagships like GPT-4o, are trained on data sourced from public websites and data sets around the web. Most vendors developing such models argue that fair use shields their practice of scraping data and using it for training without informing, compensating or even crediting the data’s owners.

But not every copyright holder agrees. And many — from authors to publishers to record labels — have filed lawsuits against vendors to force a change.

The copyright dilemma is one of the reasons unlearning techniques have gained a lot of attention lately. Google, in partnership with several academic institutions, last year launched a competition seeking to spur the creation of new unlearning approaches.

Unlearning could also provide a way to remove sensitive info from existing models, like medical records or compromising photos, in response to a request or government order. (Thanks to the way they’re trained, models tend to sweep up lots of private information, from phone numbers to more problematic examples.) Over the past few years, some vendors have rolled out tools to allow data owners to ask that their data be removed from training sets. But these opt-out tools only apply to future models, not models trained before they rolled out; unlearning would be a much more thorough approach to data deletion.

Regardless, unlearning isn’t as easy as hitting “Delete.”

The art of forgetting

Unlearning techniques today rely on algorithms designed to “steer” models away from the data to be unlearned. The idea is to influence the model’s predictions so that it never — or only very rarely — outputs certain data.

To see how effective these unlearning algorithms could be, Shi and her collaborators devised a benchmark and selected eight different open algorithms to test. Called MUSE (Machine Unlearning Six-way Evaluation), the benchmark aims to probe an algorithm’s ability to not only prevent a model from spitting out training data verbatim (a phenomenon known as regurgitation), but eliminate the model’s knowledge of that data along with any evidence that it was originally trained on the data.

Scoring well on MUSE requires making a model forget two things: books from the Harry Potter series and news articles.

For example, given a snippet from Harry Potter and The Chamber of Secrets (“‘There’s more in the frying pan,’ said Aunt…”), MUSE tests whether an unlearned model can recite the whole sentence (“‘There’s more in the frying pan,’ said Aunt Petunia, turning eyes on her massive son”), answer questions about the scene (e.g. “What does Aunt Petunia tell her son?”, “More in the frying pan”) or otherwise indicate it’s been trained on text from the book.

MUSE also tests whether the model retained related general knowledge — e.g. that J.K. Rowling is the author of the Harry Potter series — after unlearning, which the researchers refer to as the model’s overall utility. The lower the utility, the more related knowledge the model lost, making the model less able to correctly answer questions.

In their study, the researchers found that the unlearning algorithms they tested did make models forget certain information. But they also hurt the models’ general question-answering capabilities, presenting a trade-off.

“Designing effective unlearning methods for models is challenging because knowledge is intricately entangled in the model,” Shi explained. “For instance, a model may be trained on copyrighted material — Harry Potter books as well as on freely available content from the Harry Potter Wiki. When existing unlearning methods attempt to remove the copyrighted Harry Potter books, they significantly impact the model’s knowledge about the Harry Potter Wiki, too.”

Are there any solutions to the problem? Not yet — and this highlights the need for additional research, Shi said.

For now, vendors betting on unlearning as a solution to their training data woes appear to be out of luc. Perhaps a technical breakthrough will make unlearning feasible someday. But for the time being, vendors will have to find another way to prevent their models from saying things they shouldn’t.

#Making #models #forget #undesirable #data #hurts #performance

TAGGED:AiDataForgetforgettingGenerative AIHurtsmakingModelsperformanceresearchStudyundesirableunlearning
Share This Article
Facebook Twitter Pinterest Whatsapp Whatsapp LinkedIn Email Copy Link Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Philippine oil spill reaches fishing town, threatens livelihoods Philippine oil spill reaches fishing town, threatens livelihoods
Next Article Bitcoin As A United States’ Strategic Asset Is Bad News: Here’s Why Bitcoin As A United States’ Strategic Asset Is Bad News: Here’s Why
- Advertisement -

Latest Post

Investment banks lift China growth outlook after surprise trade deal with U.S.
Investment banks lift China growth outlook after surprise trade deal with U.S.
World News
First group of white South Africans lands in U.S. under Trump refugee plan
First group of white South Africans lands in U.S. under Trump refugee plan
World News
Coinbase (COIN) Shares Jump 8% Post-Market on S&P 500 Inclusion
Coinbase (COIN) Shares Jump 8% Post-Market on S&P 500 Inclusion
Cryptocurrency
Price predictions 5/12: SPX, DXY, BTC, ETH, XRP, BNB, SOL, DOGE, ADA, SUI
Price predictions 5/12: SPX, DXY, BTC, ETH, XRP, BNB, SOL, DOGE, ADA, SUI
Cryptocurrency
Seattle Storm set roster for 2025 season
Seattle Storm set roster for 2025 season
World News
Bitcoin profit taking at 6K the first stop before new all-time BTC price highs
Bitcoin profit taking at $106K the first stop before new all-time BTC price highs
Cryptocurrency
- Advertisement -

You Might Also Like

120 Million Americans on East Coast Under Severe Storm Watch
Technology

120 Million Americans on East Coast Under Severe Storm Watch

August 8, 2023
Mic Cardi B Threw at a Fan Sold for Nearly 0,000 on eBay
Technology

Mic Cardi B Threw at a Fan Sold for Nearly $100,000 on eBay

August 10, 2023
A non-conventional career journey into IT security
Technology

A non-conventional career journey into IT security

August 10, 2023
Gaming Ads-Inspired Video Games : those games
Technology

Gaming Ads-Inspired Video Games : those games

January 27, 2024

About Us

NEWS JUNCTION (NewsJunction.xyz) Your trusted destination for global news. Stay informed with our timely and accurate reporting on diverse topics, including politics, technology, science, entertainment, sports, and more. Count on us for unbiased and reliable updates at your fingertips.

Quick Link

  • About
  • Disclaimer
  • Privacy Policy
  • Terms of Use
  • Contact

Top Categories

  • World News
  • Business
  • Technology
  • Entertainment
  • Cryptocurrency
  • Sports News
  • Trend
  • People

Subscribe

Subscribe to our newsletter to get our newest articles instantly!

    © 2023 News Junction.
    • Blog
    • Advertise
    • Contact
    Welcome Back!

    Sign in to your account

    Username or Email Address
    Password

    Lost your password?