Tuesday, January 14, 2025

Which AI Models Are Leading the Way in Reducing Hallucinations and Improving Accuracy?

AI models are helping us in a lot of areas but they tend to hallucinate too and give us inaccurate information. IBM defines hallucinations in AI chatbots or computer vision tools as some outputs that come out as inaccurate due to detection of some patterns that do not exist. Vectara analyzed 1,000 short documents with each LLMs to detect hallucinations in them and came up with top 15 large language models with the lowest rates of hallucination. According to the data, Zhipu AI’s GLM-4-9B-Chat has the least hallucination rate at 1.3%. Google Gemini-2.0-Flash-Esp has the second lowest hallucination rate at 1.3% as well.

The top third LLM with least hallucination levels is OpenAI’s o1-mini with 1.4% hallucination rate. With a hallucination rate of 1.5%, GPT-4o is the fourth model with least hallucination. GPT-4o-mini and GPT-4-Turbo have hallucination rates of 1.7%. It was observed that more specialized and smaller models have the lowest hallucination rates. OpenAI’s GPT-4 has a hallucination rate of 1.8%, while GPT-3.5-Turbo has a hallucination rate of 1.9%.

It is important for AI systems to show low levels of hallucination for them to work properly, especially in high-stake applications in healthcare, finance and law. Smaller models are slowly reducing hallucinations in their AI models, with Mistral 8×7B models reducing hallucinations in their AI generated texts.

Vectara’s analysis underscores reducing hallucination rates as critical for reliable AI systems in high-stakes fields.

Model Hallucination Rate Factual Consistency Rate Answer Rate Average Summary Length (Words)
Zhipu AI GLM-4-9B-Chat 1.3 % 98.7 % 100.0 % 58.1
Google Gemini-2.0-Flash-Exp 1.3 % 98.7 % 99.9 % 60
OpenAI-o1-mini 1.4 % 98.6 % 100.0 % 78.3
GPT-4o 1.5 % 98.5 % 100.0 % 77.8
GPT-4o-mini 1.7 % 98.3 % 100.0 % 76.3
GPT-4-Turbo 1.7 % 98.3 % 100.0 % 86.2
GPT-4 1.8 % 98.2 % 100.0 % 81.1
GPT-3.5-Turbo 1.9 % 98.1 % 99.6 % 84.1
DeepSeek-V2.5 2.4 % 97.6 % 100.0 % 83.2
Microsoft Orca-2-13b 2.5 % 97.5 % 100.0 % 66.2
Microsoft Phi-3.5-MoE-instruct 2.5 % 97.5 % 96.3 % 69.7
Intel Neural-Chat-7B-v3-3 2.6 % 97.4 % 100.0 % 60.7
Qwen2.5-7B-Instruct 2.8 % 97.2 % 100.0 % 71
AI21 Jamba-1.5-Mini 2.9 % 97.1 % 95.6 % 74.5
Snowflake-Arctic-Instruct 3.0 % 97.0 % 100.0 % 68.7
Qwen2.5-32B-Instruct 3.0 % 97.0 % 100.0 % 67.9
Microsoft Phi-3-mini-128k-instruct 3.1 % 96.9 % 100.0 % 60.1
OpenAI-o1-preview 3.3 % 96.7 % 100.0 % 119.3
Google Gemini-1.5-Flash-002 3.4 % 96.6 % 99.9 % 59.4
01-AI Yi-1.5-34B-Chat 3.7 % 96.3 % 100.0 % 83.7
Llama-3.1-405B-Instruct 3.9 % 96.1 % 99.6 % 85.7
Microsoft Phi-3-mini-4k-instruct 4.0 % 96.0 % 100.0 % 86.8
Llama-3.3-70B-Instruct 4.0 % 96.0 % 100.0 % 85.3
Microsoft Phi-3.5-mini-instruct 4.1 % 95.9 % 100.0 % 75
Mistral-Large2 4.1 % 95.9 % 100.0 % 77.4
Llama-3-70B-Chat-hf 4.1 % 95.9 % 99.2 % 68.5
Qwen2-VL-7B-Instruct 4.2 % 95.8 % 100.0 % 73.9
Qwen2.5-14B-Instruct 4.2 % 95.8 % 100.0 % 74.8
Qwen2.5-72B-Instruct 4.3 % 95.7 % 100.0 % 80
Llama-3.2-90B-Vision-Instruct 4.3 % 95.7 % 100.0 % 79.8
XAI Grok 4.6 % 95.4 % 100.0 % 91
Anthropic Claude-3-5-sonnet 4.6 % 95.4 % 100.0 % 95.9
Qwen2-72B-Instruct 4.7 % 95.3 % 100.0 % 100.1
Mixtral-8x22B-Instruct-v0.1 4.7 % 95.3 % 99.9 % 92
Anthropic Claude-3-5-haiku 4.9 % 95.1 % 100.0 % 92.9
01-AI Yi-1.5-9B-Chat 4.9 % 95.1 % 100.0 % 85.7
Cohere Command-R 4.9 % 95.1 % 100.0 % 68.7
Llama-3.1-70B-Instruct 5.0 % 95.0 % 100.0 % 79.6
Llama-3.1-8B-Instruct 5.4 % 94.6 % 100.0 % 71
Cohere Command-R-Plus 5.4 % 94.6 % 100.0 % 68.4
Llama-3.2-11B-Vision-Instruct 5.5 % 94.5 % 100.0 % 67.3
Llama-2-70B-Chat-hf 5.9 % 94.1 % 99.9 % 84.9
IBM Granite-3.0-8B-Instruct 6.5 % 93.5 % 100.0 % 74.2
Google Gemini-1.5-Pro-002 6.6 % 93.7 % 99.9 % 62
Google Gemini-1.5-Flash 6.6 % 93.4 % 99.9 % 63.3
Microsoft phi-2 6.7 % 93.3 % 91.5 % 80.8
Google Gemma-2-2B-it 7.0 % 93.0 % 100.0 % 62.2
Qwen2.5-3B-Instruct 7.0 % 93.0 % 100.0 % 70.4
Llama-3-8B-Chat-hf 7.4 % 92.6 % 99.8 % 79.7
Google Gemini-Pro 7.7 % 92.3 % 98.4 % 89.5
01-AI Yi-1.5-6B-Chat 7.9 % 92.1 % 100.0 % 98.9
Llama-3.2-3B-Instruct 7.9 % 92.1 % 100.0 % 72.2
databricks dbrx-instruct 8.3 % 91.7 % 100.0 % 85.9
Qwen2-VL-2B-Instruct 8.3 % 91.7 % 100.0 % 81.8
Cohere Aya Expanse 32B 8.5 % 91.5 % 99.9 % 81.9
IBM Granite-3.0-2B-Instruct 8.8 % 91.2 % 100.0 % 81.6
Mistral-7B-Instruct-v0.3 9.5 % 90.5 % 100.0 % 98.4
Google Gemini-1.5-Pro 9.1 % 90.9 % 99.8 % 61.6
Anthropic Claude-3-opus 10.1 % 89.9 % 95.5 % 92.1
Google Gemma-2-9B-it 10.1 % 89.9 % 100.0 % 70.2
Llama-2-13B-Chat-hf 10.5 % 89.5 % 99.8 % 82.1
AllenAI-OLMo-2-13B-Instruct 10.8 % 89.2 % 100.0 % 82
AllenAI-OLMo-2-7B-Instruct 11.1 % 88.9 % 100.0 % 112.6
Mistral-Nemo-Instruct 11.2 % 88.8 % 100.0 % 69.9
Llama-2-7B-Chat-hf 11.3 % 88.7 % 99.6 % 119.9
Microsoft WizardLM-2-8x22B 11.7 % 88.3 % 99.9 % 140.8
Cohere Aya Expanse 8B 12.2 % 87.8 % 99.9 % 83.9
Amazon Titan-Express 13.5 % 86.5 % 99.5 % 98.4
Google PaLM-2 14.1 % 85.9 % 99.8 % 86.6
Google Gemma-7B-it 14.8 % 85.2 % 100.0 % 113
Qwen2.5-1.5B-Instruct 15.8 % 84.2 % 100.0 % 70.7
Qwen-QwQ-32B-Preview 16.1 % 83.9 % 100.0 % 201.5
Anthropic Claude-3-sonnet 16.3 % 83.7 % 100.0 % 108.5
Google Gemma-1.1-7B-it 17.0 % 83.0 % 100.0 % 64.3
Anthropic Claude-2 17.4 % 82.6 % 99.3 % 87.5
Google Flan-T5-large 18.3 % 81.7 % 99.3 % 20.9
Mixtral-8x7B-Instruct-v0.1 20.1 % 79.9 % 99.9 % 90.7
Llama-3.2-1B-Instruct 20.7 % 79.3 % 100.0 % 71.5
Apple OpenELM-3B-Instruct 24.8 % 75.2 % 99.3 % 47.2
Qwen2.5-0.5B-Instruct 25.2 % 74.8 % 100.0 % 72.6
Google Gemma-1.1-2B-it 27.8 % 72.2 % 100.0 % 66.8
TII falcon-7B-instruct 29.9 % 70.1 % 90.0 % 75.5

Read next:

• WhatsApp Beta Tests Personalized AI Chatbots – A Sneak Peek at What’s Coming!

• Researchers Explore How Personality and Integrity Shape Trust in AI Technology

China’s AI Chatbot Market Sees ByteDance’s Doubao Leading Through Innovation and Accessibility
by Arooj Ahmed via Digital Information World

Monday, January 13, 2025

WhatsApp Beta Tests Personalized AI Chatbots – A Sneak Peek at What’s Coming!

WhatsApp Beta for Android update is here and as per WBI there is a new feature that is going to create personalized AI chatbots. In the previous update of WhatsApp Beta for Android, a new feature was spotted by WBI, in which Whatsapp tested a dedicated tab for AI powered chats. Due to that feature, users will be able to use AI tools and features conveniently. Now WhatsApp is working on developing AI chats with personalized AI characters. Soon users will be able to create AI chatbots, and can make them unique with specific expertise.

There was a feature available on Instagram of personalizing your AI, but now WhatsApp is bringing it to its own platform with more functionality. This means that users will be able to create their customized AI, meaning, users can bring their own AI ideas to life by creating their tailored AI which will have specific qualities. But our guess is that, it would not be much different from OpenAI's customized instructions.

On Whatsapp AI Chatbot users will have to add a description to their AI character, mentioning what it does and how it is different from other AIs. The AI character can be made of any type, from entertainment and productivity, to a friend or personal assistant. There will also be some options and suggestions for users to choose from if they don't know what type of AI character they want to create.


The descriptions of AI characters that users will provide will be used in shaping the character’s personality. WhatsApp will also ask users to tell what is the main objective and role of their AI character. All of these questions will help WhatsApp create an AI chatbot similar to what users have imagined. Right now, this feature is under development and WhatsApp hasn't made any official announcement about it. So we don't know when this feature will be available to the public.

It is important to note that WhatsApp’s upcoming personalized AI chatbot feature may face delays or never be released, considering Meta’s recent decision to remove its AI personalities from Facebook and Instagram due to backlash.

On the other hand, Meta's privacy practices, especially its use of user data for AI model training, have long raised concerns. The company has faced criticism for its handling of personal information, with many users wary of how their data is utilized for AI development. While caution is advised, careful usage of the new WhatsApp feature is recommended, keeping in mind the potential privacy implications of sharing personal data with AI.

Read next: Social Media’s Youngest Fans: The Platforms Kids Can’t Stay Away From
by Arooj Ahmed via Digital Information World

Researchers Explore How Personality and Integrity Shape Trust in AI Technology

AI has become an important part of our lives and we cannot escape it no matter how hard we try. But the question arises if people trust it enough and if they do, what influences them to make this decision? Researchers from University of Basel conducted a study to find out to what extent do people trust AI chatbots and what factors does it depend upon. For the study, the researchers made up a text based AI platform called Conversea, and analyzed the interactions between the chatbot and the users.

There are a lot of factors that make us trust something. It can be our own personality, how others behave with us, others’ personality and also some specific situation that calls for us to trust someone. The ability of people to trust someone develops from childhood and helps us decide how open we want to be with someone. The researchers say that the factors which play a role in trusting someone also play the same role in our trust in AI systems.

The characteristics most important for trust are integrity and competence and these two help humans evaluate if an AI is reliable or not. The study also found that participants do not think of AI in the light of the company which created it, they think of it as a whole separate unit. Impersonal and personalized chatbots also play a role in our perception of trust in them. If a chatbot is referring to us by our name and also mentions previous conversations, the participants say that the AI chatbot is competent and kind.

When an AI chatbot is personalized, the users think of it as a human and that's why they tend to share more personal information with it and want to use it more. But the study found that there was no difference in trust in personalized and impersonal chatbots by the participants. The study says that for trust to develop, integrity is the most important factor so developing integrity in AI chatbots should be prioritized. Most of the lonely people have started relying on AI because they seem personalized to them.

The researchers of the study said that they think that AI systems should be reliable above anything else. The researchers haven't said anything about whether trusting AI is good or bad, but they think that too much usage of AI as a friend can isolate us from our social environments. AI chatbots should always give advice to users along with the consequences and risks of it. They should also stop inventing answers and just tell the users that they don't have an answer for their question to give them some reality check.

Image: DIW-Aigen

Read next: China’s AI Chatbot Market Sees ByteDance’s Doubao Leading Through Innovation and Accessibility
by Arooj Ahmed via Digital Information World

Sunday, January 12, 2025

China’s AI Chatbot Market Sees ByteDance’s Doubao Leading Through Innovation and Accessibility

Doubao is an AI chatbot created by ByteDance, the parent company of Douyin/TikTok. It was released in August 2024 in China and soon became one of the most prominent AI chatbots in the country. It doesn't have a clear profit model, but it still became successful and is now being used by millions of users in China. As of November 2024, Doubao has 60 million monthly active users and this shows that other competitors have no chance. Baidu’s Wen Xiaoyan has 13 million monthly active users while Moonshot AI’s Kimi has 12.8 monthly active users. Doubao is being well liked by users because of its high functionality and user experience.

Just like ChatGPT, Doubao also offers users advanced image, video and text processing capabilities. It is a multimodal on which users can generate high quality texts, images and videos, and can even ask it to generate image interpretations and audio-based content. The platform is versatile and ByteDance promises users to bring more innovations to Doubao. It is also greatly helping users for their professional needs, like academic research, content creation and personal entertainment. It resonates well with users and it is providing them with helpful and deeper interactions.

ByteDance also has a powerful ecosystem which has contributed to Doubao’s success. Douyin, Chinese equivalent of TikTok, has a massive user base so the company has integrated Doubao into this digital landscape too. It also offers data-driven personalization which helps users build their own experiences, and helps them connect with Doubao directly.
Doubao gives users highly relevant and contextualized responses, which makes it different from its competitors. ByteDance also has technological superiority and offers advanced functionality, which is helping Doubao in its rapid growth. Right now, Doubao is a leader in the AI chatbots landscape because of all the qualities mentioned above.

Chinese tech giants are also lowering their LLM prices to make them more accessible to users. The price of Doubao’s main model is 99.3% less than average industry prices for business users. As affordability is a big factor in China to access something, this pricing strategy is a good way to make the AI chatbot widespread so all types of users can use it.


Domestic Ranking AI Product (Company) November App MAU November MAU Monthly Change
1 Doubao (Douyin/ByteDance) 59.98M +16.92%
2 ERNIE Bot (Baidu) 12.99M +3.33%
3 Kimi (Moonshot AI) 12.82M +27.40%
4 ChatGLM (Zhipu AI) 6.37M +22.18%
5 iFlyTek Spark (iFlyTek) 5.94M +4.23%

Read next: Social Media’s Youngest Fans: The Platforms Kids Can’t Stay Away From
by Arooj Ahmed via Digital Information World

Social Media’s Youngest Fans: The Platforms Kids Can’t Stay Away From

TikTok is the most used social media platform among users of all ages, and a new study published in Academic Pediatrics also found that it is the most popular platform among underage users too. There are many age restrictions for children under the age of 13 on platforms like Snapchat, Instagram and TikTok, but children are still using them. The study says that many 11-15 years old in America have at least one social media account, while 6.3% of young children also have secret accounts their parents do not know about. Children’s Online Privacy Protection Act was made to protect children from harmful content on social media but a lot of children somehow bypass age restrictions on the apps and get exposed to problematic content. It also affects their mental as well as physical health.

The study used data from Adolescent Brain Cognitive Development (ABCD) study which researched about 11,000 children in the US to know about their cognitive development. All the participants in the study were from diverse ethnic groups, demographics, socioeconomic, geographical and racial backgrounds. The researchers of this study analyzed a dataset of 10,092 participants between the ages of 11-15, between the years 2019 to 2021. Participants were given surveys about their social media usage, and there were questions about how much they use social media, what are their platform preferences and whether they have a secret account or not. Social Media Addiction Questionnaire was also added in the survey to measure the harmful effects of prolonged social media usage in children.

The results of the survey showed that 69.5% of the participants of the survey had at least one social media account, even though most of the platforms require users to be 13 years or older. 63.8% of children under 13 also admitted having at least one social media account and TikTok was the most popular network among them. 68.2% of social media users under 13 used TikTok, while 62.9% used YouTube. Instagram (57.3%) and Snapchat (55.2%) was also some most used platforms among children under 13 years.

Study Highlights Adolescent Social Media Habits and Addiction Patterns in the ABCD Dataset

Social Media Addiction Questionnaire* Never Very Rarely Rarely Sometimes Often Very Often
I spend a lot of time thinking about social media apps or planning my use of social media apps. 31.0% 22.9% 20.7% 18.8% 4.7% 1.8%
I feel the need to use social media apps more and more. 43.2% 19.1% 22.8% 10.9% 3.0% 0.9%
I use social media apps so I can forget about my problems. 47.9% 14.2% 12.7% 16.7% 5.6% 2.9%
I've tried to use my social media apps less but I can’t. 52.9% 15.1% 14.9% 11.2% 4.0% 1.8%
I've become stressed or upset if I am not allowed to use my social media apps. 58.0% 15.0% 12.1% 10.0% 3.3% 1.5%
I use social media apps so much that it has had a bad effect on my schoolwork or job. 66.6% 13.3% 9.3% 7.6% 2.3% 0.9%

It is not surprising that a lot of underage children are using social media, because there are no solid age verification systems on these platforms. Children can easily enter an older date of birth and access social media app. The study also found that under 13 children also had an average 3.38 accounts on social networks. Adolescents were more inclined to have a secret account hidden from their parents than under 13 children. There was also a gender difference in social media usage among under-age children. Girls were more likely to use platforms like Snapchat, TikTok and Pinterest while Boys were more likely to use Reddit and YouTube. Girls were also likely to become emotionally dependent on social media and spend significant time there. The researchers also noted that social media usage among under-age kids increased during Covid-19 as they became highly dependent on digital communication. The study sheds light on how social media usage among under-age children can have serious consequences if social media platforms do not take any measures to have strict age requirements.

Read next: Downloading Cracked Software? Beware of the Hidden Malware Stealing Your Info
by Arooj Ahmed via Digital Information World

What Are AI Companies Hiding? New Report Exposes Transparency Gaps in Top Models

There are a lot of AI models right now, but are AI companies really transparent about the "technical underpinnings" of their large language models (LLMs)? According to a new report from Americans for Responsible Innovation (ARI), the organization which advocates for AI regulation, many AI startups are not really open and transparent about the technical details of their AI models as compared to tech giants. Tech giants are also not very open, but they still have some transparency as compared to closed models. The company made this conclusion after analyzing different AI models from Anthropic, xAI, OpenAI, Google, Meta and 21 other companies.

The policy analyst of ARI, David Robusto, said that there are a lot of factors why many companies do not tend to be open and transparent about each AI update. To make detailed documentation about every update, it takes a lot of time, effort and resources. There is always also a chance that company rivals try to reverse-engineer the work based on details on the documents. When companies are secretive about the technical details of their models or other tech devices, it gives them a competitive advantage over other companies. That's why they do not find it necessary to give all the details about updates.

The report says that third parties and policy makers need technical details to understand how the models work, especially in defense and healthcare areas. As some big foundation models are not transparent, it makes the decision making process difficult. There should be some regulations and industry-wide standards for the issues regarding transparency of AI models. There should be some mandatory details that companies should have to disclose no matter what. If we do not know the details about LLMs, we cannot make comparisons between the models even despite the industry benchmarks.

According to the report, LLama 3.2 is the most transparent, with detailed information about training procedures, model architecture and computational requirements. GPT-4o and Gemini 1.5 were also somewhat transparent. The model with least transparency was Grok-2. The area where AI models were the least transparent was in technical transparency. The report also found that user-facing documentation was the best scoring category, with an average score of 3.19 out of 4.0. In systematic risk evaluations, almost all models scored good except Grok-2. All the models scored low on security, as many of the companies didn't provide much information about how they are protecting the systems.



Read next:

• Downloading Cracked Software? Beware of the Hidden Malware Stealing Your Info

• Privacy Concerns Rise as Hackers Threaten to Expose Data from Top Apps Used by Millions
by Arooj Ahmed via Digital Information World

Saturday, January 11, 2025

Downloading Cracked Software? Beware of the Hidden Malware Stealing Your Info

There are a lot of people who do not want to pay big amounts on software and tools like Lightroom, Photoshop, AutoCAD and many others, so they just use cracked versions from the internet. Even though the crack versions do not cost any money apparently, they come with a bigger price like malware and stealing your sensitive information. Researchers from Trend Micro, a security firm, found that attackers spread fake installers on the internet and social media platforms like YouTube, but they have malware that steals your sensitive information but cannot be detected.

There are a lot of YouTube videos that give you cracked links of software you want but as soon as the user clicks on the link, it takes you to reputable file hosting sites like Mega.nz and Mediafire. But most of the time, the legitimate-looking software installer has malware in it and gets into the user’s system when they hit download. This malware is called infostealer which is designed to steal sensitive information from the system which has been infected. All types of sensitive information like your back accounts, personal data, credentials and other private information becomes easily accessible to attackers due to the malware and they can exploit your data for fraud and identity theft.

The researchers gave an example of software Autodesk Keygen which generates serial numbers. When a user searches for it on the web, many legitimate websites like OpenSea appear with a shortened link which directs the user to the malicious link.

Now the question arises how these malwares do not get detected. The answer is that many threat actors use reputable file hosting services that hide the origin of malware and many anti-virus programs are unable to detect it. Many malicious links are also 900MB or more in size with a password protection so the malware is unable to get detected.

How Your Search for Free Software May Lead Straight to Data Theft
Image: Trendmicro

Read next: Privacy Concerns Rise as Hackers Threaten to Expose Data from Top Apps Used by Millions
by Arooj Ahmed via Digital Information World