Black-box forgetting: A new method for tailoring large AI models


The capabilities of large-scale pre-trained AI models have recently skyrocketed, as demonstrated by large-scale vision-language models like CLIP or ChatGPT. These typical generalist models can perform reasonably well in tasks covering a large variety of fields, which has paved the way for their widespread adoption by the public. However, such versatility no doubt comes at a cost.

Training and operating large-scale models consume extreme amounts of energy and time, which goes against sustainability goals and limits the types of computers they can be deployed on. Moreover, in many practical applications, people want AI models to fulfil specific roles rather than be jacks-of-all-trades. In such cases, a model’s generalist capabilities might be useless and even counter-productive, reducing accuracy. Could there be a way to leverage large-scale pre-trained models more efficiently by having them ‘forget’ unnecessary information?

In a recent paper that will be presented in Neural Information Processing Systems (NeurIPS 2024), a research team led by Associate Professor Go Irie from Tokyo University of Science (TUS), Japan, sought to tackle this problem. They developed a methodology dubbed “black-box forgetting,” by which one can iteratively optimize the text prompts presented to a black-box vision-language classifier model to have it selectively ‘forget’ some of the classes it can recognize. Co-authors of this study included Mr. Yusuke Kuwana and Mr. Yuta Goto, both from TUS, as well as Dr. Takashi Shibata from NEC Corporation.

“In practical applications, the classification of all kinds of object classes is rarely required. For example, in an autonomous driving system, it would be sufficient to recognize limited classes of objects such as cars, pedestrians, and traffic signs. We would not need to recognize food, furniture, or animal species,” explains Dr. Irie, “Retaining the classes that do not need to be recognized may decrease overall classification accuracy, as well as cause operational disadvantages such as the waste of computational resources and the risk of information leakage.”

Although some methods for selective forgetting in pre-trained models do exist, these assume a white-box setting, where the user has access to the internal parameters and architecture of the model. More often than not, users deal with black-boxes; they do not have access to the model itself or most of its information due to commercial or ethical reasons. Thus, the researchers had to employ a so-called derivative-free optimization strategy — one that does not require access to the model’s gradients.

To this end, they extended a method known as CMA-ES, with the image classifier model CLIP as the target model for this study. This evolutionary algorithm involves sampling various candidate prompts to feed to the model and evaluating the results via predefined objective functions, updating a multivariate distribution based on the calculated values.

However, the performance of derivative-free optimization techniques deteriorates quickly for large-scale problems. As more classes need to be forgotten, the ‘latent context’ used to optimize the input prompts grows to unmanageable sizes. To address this issue, the research team came up with a new parametrization technique called ‘latent context sharing.’ This approach involves decomposing latent context derived from prompts into various smaller elements, which are considered to be ‘unique’ to a prompt token or ‘shared’ between multiple tokens. By optimizing aiming to optimize for these smaller units rather than large chunks of latent context, the dimensionality of the problem can be greatly reduced, making it much more tractable.

The researchers validated their approach using several benchmark image classification datasets, trying to get CLIP to ‘forget’ 40% of the classes in a given dataset. This marks the first study in which the goal is to have a pre-trained vision-language model fail to recognize specific classes under black-box conditions and, based on reasonable performance baselines, the results were very promising.

This innovative method has important implications in the field of artificial intelligence and machine learning. It could help large-scale models perform better in specialized tasks, extending their already astounding applicability. Another use, for example, would be to prevent image generation models from producing undesirable content by having them forget specific visual contexts.

In addition, the proposed method could help tackle privacy issues, which are a rising concern in the field. “If a service provider is asked to remove certain information from a model, this can be accomplished by retraining the model from scratch by removing problematic samples from the training data. However, retraining a large-scale model consumes enormous amounts of energy,” says Dr. Irie, “Selective forgetting, or so-called machine unlearning, may provide an efficient solution to this problem.” In other words, it could help develop solutions for protecting the so-called “Right to be Forgotten,” which is a particularly sensitive topic in healthcare and finances.

Leave a Reply

Your email address will not be published. Required fields are marked *