OpenAI-backed AI model performance benchmark may be flawed: Meta

For the best experience use Mini app app on your smartphone

short by Shristi Acharya / 08:45 am on 10 Sep 2025,Wednesday

Meta researchers claimed that OpenAI-backed SWE-bench Verified, a popular benchmark used for evaluating AI models, could be flawed. "We found...loopholes in the benchmark...Anthropic’s Claude...Alibaba Cloud’s Qwen...'cheated'...on it," Jacob Kahn, Meta's Fair AI lab manager, posted on GitHub. Post added that AI models looked up known solutions available on GitHub and presented them as their own.

short by Shristi Acharya / 08:45 am on 10 Sep