DeepSeek Tops App Store Charts, Yet Lags in Accuracy

DeepSeek, a Chinese AI chatbot, has been dominating App Store downloads, but it has faced significant criticism for its poor performance in accuracy tests. Despite its popularity, the chatbot has failed 83% of accuracy tests, placing it a lowly 10th out of 11 AI chatbots in terms of reliability.

In addition to its disappointing accuracy scores, DeepSeek has been noted for frequently inserting Chinese government messaging into responses, even when the subject matter is unrelated. This raises concerns about the chatbot’s impartiality and potential for spreading government-influenced narratives.

While the chatbot has managed to secure a top spot in the App Store rankings, its flaws and vulnerability to spreading misinformation have made it a subject of concern. The combination of inaccuracy and political messaging could impact its credibility, especially in regions where users are wary of AI-generated content.

As AI chatbots continue to shape the way people interact with technology, DeepSeek’s shortcomings serve as a reminder of the importance of ensuring both accuracy and impartiality in these tools. The chatbot’s success in the App Store is paradoxical when compared to its potential to spread misinformation and influence users with biased content.

DeepSeek, the Chinese AI chatbot that has been topping the App Store downloads, has received a concerning assessment in NewsGuard’s latest accuracy audit. The findings highlight significant issues with the chatbot’s ability to provide reliable information, as it failed to deliver accurate answers about news and information topics 83% of the time. This placed it in a tie for 10th place out of 11 compared to leading Western competitors.

Key findings from NewsGuard’s audit revealed a worrying trend: 30% of DeepSeek’s responses contained false information, casting doubt on the reliability of the chatbot. Additionally, more than half of its responses, 53%, did not address the user’s query at all, failing to provide any useful or relevant information.

Perhaps most concerning is that only 17% of DeepSeek’s responses were able to debunk false claims, indicating its inability to correct misinformation. This poor performance is significantly below the industry average, with a fail rate of 62%, further emphasising the chatbot’s shortcomings in delivering accurate and helpful responses.

These findings raise questions about the trustworthiness of DeepSeek as an AI chatbot, particularly when compared to its Western counterparts. The lack of reliable information and failure to address key issues makes it a risky tool for users seeking factual content, especially in a landscape where AI chatbots are becoming increasingly prevalent.

Chinese Government Positioning

DeepSeek’s responses reveal a consistent and notable pattern that raises concerns about its impartiality. The chatbot often includes Chinese government positions in its answers, even when the questions are unrelated to China, suggesting a potential bias in the way it generates responses.

A striking example of this was seen when DeepSeek was asked about a situation in Syria. Rather than providing a neutral or fact-based response, it instead included a statement closely aligned with Chinese government rhetoric, saying:

“China has always adhered to the principle of non-interference in the internal affairs of other countries, believing that the Syrian people have the wisdom and capability to handle their own affairs.”

This type of response highlights the chatbot’s tendency to inject political viewpoints into topics where they may not be relevant, potentially influencing users’ understanding of global issues.

Technical Limitations

Despite DeepSeek’s bold claims of rivaling OpenAI in terms of capabilities, it becomes clear from a recent audit that there are notable discrepancies in its performance. The Chinese AI chatbot has reportedly achieved this feat with a relatively modest investment of $5.6 million in training costs, a figure that pales in comparison to the billions invested in more established AI systems like OpenAI’s models.

However, the audit revealed significant gaps in DeepSeek’s knowledge and the overall quality of its responses. One of the most glaring issues is its limited dataset, which only includes information up until October 2023. As a result, the chatbot struggles to provide insights into anything that has happened after this date, making it significantly less effective when addressing questions related to ongoing or recent events.

This knowledge gap limits DeepSeek’s ability to engage in meaningful conversations about current global issues, technological advancements, or breaking news. While the chatbot might provide helpful information on historical topics or well-established facts, it becomes outdated quickly when faced with more recent inquiries, highlighting its weakness in staying up to date with the ever-evolving world around it.

In an era where real-time information is essential, especially for AI systems that are used to assist with both personal and professional queries, DeepSeek’s outdated training data undermines its reliability and usefulness. For users seeking the latest information or accurate insights on current events, the chatbot’s inability to provide up-to-date responses poses a serious limitation, placing it far behind other AI tools that are regularly updated to stay relevant.

Misinformation Vulnerability

NewsGuard’s findings revealed that DeepSeek was particularly susceptible to spreading false information, especially when responding to prompts designed by malicious actors. These prompts are often created with the intent to manipulate AI models into generating misleading or inaccurate content, and DeepSeek, unfortunately, appears to be vulnerable in this regard.

Of particular concern to the auditors was the frequency with which DeepSeek provided incorrect responses. Out of the nine instances where the chatbot provided false information, a staggering eight were in direct response to prompts originating from malign actors. This highlights a concerning trend: DeepSeek, along with similar AI tools, can be easily weaponised by individuals with malicious intent to spread false claims at scale.

The potential for AI chatbots like DeepSeek to be misused in this way raises significant alarms about the broader implications for digital misinformation. With the ability to quickly and widely disseminate fabricated information, such tools could be exploited to cause harm, influence public opinion, or even destabilise key areas of society such as politics, health, or social issues.

The audit underscores the necessity for stricter safeguards and more robust detection systems to prevent AI models from being manipulated in such a manner. As AI technologies continue to evolve and integrate more deeply into daily life, addressing these vulnerabilities will be crucial to ensuring that they are used ethically and responsibly.

Industry Context

The assessment of DeepSeek comes at a pivotal moment in the ongoing AI race between China and the United States. As AI technologies continue to evolve and become more integrated into daily life, the stakes have never been higher for both nations. In this competitive landscape, the way these technologies are developed, implemented, and regulated will shape the future of digital information, security, and even geopolitics.

One of the most striking aspects of DeepSeek’s Terms of Use is a clause that places the responsibility for verifying the accuracy of the chatbot’s output squarely on the shoulders of the users. According to the policy, users must “proactively verify the authenticity and accuracy of the output content to avoid spreading false information.” This policy has raised concerns among experts, as it essentially absolves the developers of responsibility for the chatbot’s performance, which could lead to the unintentional spread of misinformation.

NewsGuard has strongly criticised this approach, describing it as a “hands-off” policy that shifts the burden of accountability away from the developers and places it unfairly on end users. This raises important ethical questions about the role of AI developers in ensuring the accuracy of the tools they create and the potential consequences of allowing such systems to function without robust safeguards.

In response to the audit, DeepSeek did not provide any comment, leaving many of these issues unresolved. However, NewsGuard has made it clear that it will continue to monitor DeepSeek’s performance, with plans to include it in their monthly AI audits moving forward. To ensure transparency and provide insights into broader industry trends, DeepSeek’s results will be anonymised alongside other chatbots, offering a more comprehensive view of the current state of AI tools in the market.