OpenAI adds confession system to make ChatGPT admit bad behaviour
OpenAI has made a new training framework for AI models, called "confession", to teach AI systems to acknowledge when they've engaged in undesirable behaviour. This comes as a response to the tendency of large language models to give sycophancy or confidently state hallucinations. The confessions are evaluated solely on their honesty and will be different from their main replies.