Scientists are increasingly treating artificial intelligence systems as if they were biological organisms, adopting techniques similar to those used in medicine and neuroscience to better understand how opaque “black box” models function as they are rapidly deployed in high-stakes environments such as hospitals and churches, according to a report by MIT Technology Review, with researchers acknowledging that even AI experts still lack a clear understanding of what happens inside advanced models during decision-making.
At Anthropic, scientists have developed tools for “mechanistic interpretability” that allow them to trace a model’s internal activity while it performs tasks, a process likened to using MRI scans to study the human brain, while other experiments involve simplified neural networks, such as sparse autoencoders, that resemble biological organoids because their internal processes are easier to observe and analyze; “This is very much a biological type of analysis,” Josh Batson, a research scientist at Anthropic, told Tech Review. “It’s not like math or physics.”
Researchers are also using chain-of-thought monitoring, which involves models explaining their reasoning step by step, a method that has helped identify harmful or misaligned behavior, with OpenAI research scientist Bowen Baker saying, “It’s been pretty wildly successful in terms of actually being able to find the model doing bad things,” even as scientists warn that future AI systems could become so complex—particularly if designed by other AIs—that their behavior may be impossible to fully understand, a concern underscored by reports of real-world harm linked to AI outputs.

