For the best experience use Mini app app on your smartphone
OpenAI-backed AI model performance benchmark may be flawed: Meta
short by Shristi Acharya / on 10 Sep 2025,Wednesday
Meta researchers claimed that OpenAI-backed SWE-bench Verified, a popular benchmark used for evaluating AI models, could be flawed. "We found...loopholes in the benchmark...Anthropic’s Claude...Alibaba Cloud’s Qwen...'cheated'...on it," Jacob Kahn, Meta's Fair AI lab manager, posted on GitHub. Post added that AI models looked up known solutions available on GitHub and presented them as their own.
short by Shristi Acharya / 08:45 am on 10 Sep
For the best experience use inshorts app on your smartphone