Arizona State University researchers are pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this anthropomorphization creates dangerous misconceptions about how these systems actually work. The research team, led by Subbarao Kambhampati, examined recent "reasoning" models like DeepSeek's R1, which generate lengthy intermediate token sequences before providing final answers to complex problems. Though these models show improved performance and their intermediate outputs often resemble human scratch work, the researchers found little evidence that these tokens represent genuine reasoning processes.
Crucially, the analysis also revealed that models trained on incorrect or semantically meaningless intermediate traces can still maintain or even improve performance compared to those trained on correct reasoning steps. The researchers tested this by training models on deliberately corrupted algorithmic traces and found sustained improvements despite the semantic noise. The paper warns that treating these intermediate outputs as interpretable reasoning traces engenders false confidence in AI capabilities and may mislead both researchers and users about the systems' actual problem-solving mechanisms.
[ Read more of this story ](
https://tech.slashdot.org/story/25/05/29/1411236/researchers-warn-against-treating-ai-outputs-as-human-like-reasoning?utm_source=atom1.0moreanon&utm_medium=feed ) at Slashdot.