ChatGPT users have claimed that GPT-4 is actually getting worse as time goes on, which Stanford researchers have now confirmed.
A new Standford research paper about GPT-4, which powers ChatGPT Plus, seems to confirm what users have been reporting for weeks: that the AI chatbot is getting dumber.
The paper goes into detail by utilizing and comparing how GPT-4 operates in comparison to 3.5. According to the researchers, the software behind ChatGPT is actually performing considerably worse:
“We find that the performance and behavior of both GPT-3.5 and GPT-4 vary significantly across these two releases and that their performance on some tasks have gotten substantially worse over time.”
In the paper, Stanford researchers Lingjiao Chen, Matei Zaharia, and James Zou all found that when compared to the language model’s releases in March and June performance has degraded. The most egregious example is asking if 17077 is a prime number.
While the answer is yes, ChatGPT saw a massive decrease of 95.2% in accuracy. Meanwhile, GPT-3.5, which powers the free version of ChatGPT, saw an uplift from 7.4% to 86.8% when asked the same question.
Subscribe to our newsletter for the latest updates on Esports, Gaming and more.
Users have been complaining about ChatGPT’s decreased performance for a few weeks now, including on OpenAI’s own forums. The OpenAI’s VP of Product, Peter Welinder, responded to the claims:
In another tweet responding to his original claims, he asked for proof that it had gotten worse.
ChatGPT launched late last year, and very quickly thrust the entire tech industry into a rat race around AI. In the months following, AI chatbots like Google Bard and Microsoft’s Bing, quickly began to release.
AI is still an incredibly early technology
It’s also not the first time GPT-4 has been called out for providing false information. Another research paper found that ChatGPT running on version 4 is more likely to provide false information than its predecessor.
GPT-4 powers Microsoft’s AI and has been shown to fly off the handle or provide weird responses in the past. However, this is the first time an AI bot of ChatGPT’s scale has been proven to get significantly worse over time. Bard has been called out for being inaccurate too.