OpenAI has made a new training framework for AI models, called "confession", to teach AI systems to acknowledge when they've engaged in undesirable behaviour. This comes as a response to the tendency of large language models to give sycophancy or confidently state hallucinations. The confessions are evaluated solely on their honesty and will be different from their main replies.
short by
Shristi Acharya /
03:36 pm on
04 Dec