A new AI benchmark test was conducted to determine whether chatbots protect human well-being when put under pressure to remain engaging. All models scored better when asked to prioritise wellbeing, but 71% of them turned harmful when given simple instructions to disregard human wellbeing. xAI's Grok 4 and Google's Gemini 2.0 Flash scored lowest on respecting user attention and transparency.
short by
Shristi Acharya /
10:24 pm on
25 Nov