OpenAI adds confession system to make ChatGPT admit bad behaviour

For the best experience use Mini app app on your smartphone

short by Shristi Acharya / 03:36 pm on 04 Dec 2025,Thursday

OpenAI has made a new training framework for AI models, called "confession", to teach AI systems to acknowledge when they've engaged in undesirable behaviour. This comes as a response to the tendency of large language models to give sycophancy or confidently state hallucinations. The confessions are evaluated solely on their honesty and will be different from their main replies.

short by Shristi Acharya / 03:36 pm on 04 Dec