When tested with a classic psychological assessment, advanced AI models experienced a total breakdown in focus. A new PNAS Nexus study suggests these systems lack the human-like executive control necessary to override automatic responses and maintain complex goals.
You know the “DeepMind and OpenAi models” is the hint that the LLM model is not the one doing the math. The LLM provides a hypothesis and the DeepMind model provides grounding or feedback on whether the hypothesis even makes sense or works.
It is totally irrelevant that the model calls tools to do the math. That is still a success.