This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.
Ben Dickson, software engineer and the founder of TechTalks summarizes, Concept whitening is a technique that helps create interpretable deep learning models without incurring performance penalties.
Deep learning concept whitening Photo: TechTalks |
In tandem with the expansion of deep learning in various domains and applications, there has been a growing interest in developing techniques that try to explain neural networks by examining their results and learned parameters. But these explanations are often erroneous and misleading, and they provide little guidance in fixing possible misconceptions embedded in deep learning models during training.
In a paper published in the peer-reviewed journal Nature Machine Intelligence, scientists at Duke University propose “concept whitening,” a technique that can help steer neural networks toward learning specific concepts without sacrificing performance...
Cynthia Rudin, professor of computer science at Duke University and co-author of the concept whitening paper, had previously warned about the dangers of trusting black-box explanation techniques and had shown how such methods could provide erroneous interpretations of neural networks. In a previous paper, also published in Nature Machine Intelligence, Rudin had encouraged the use and development of AI models that are inherently interpretable. Rudin, who is also Zhi’s Ph.D. advisor, directs Duke University’s Prediction Analysis Lab, which focuses on interpretable machine learning.
Source: TechTalks