When tested with a classic psychological assessment, advanced AI models experienced a total breakdown in focus. A new PNAS Nexus study suggests these systems lack the human-like executive control necessary to override automatic responses and maintain complex goals.
A lot of tools like Claude or ChatGPT have internal tools they call when they do math (or use a python script) rather than have the model actually compute anything.
The underlying tech itself can’t do it because you can’t do math by token probability.