Wednesday, February 25, 2026

The Year of Efficiency: How Agencies Are Implementing AI in 2026 (Survey)

By Scotty Strehlow, Social & Community Manager, Duda

After two years of experimentation, 2026 marks the AI implementation-era for agencies. A new agency survey by Duda, a leading white label website building platform for agencies and SaaS companies, reveals that agencies’ top AI-related priorities for 2026 focus squarely on business performance; integrating the tool into everyday workflows that result in efficiencies, automation and expanded service offerings.

Where Agencies Are Planning to Focus


The survey found that 78% of agencies are prioritizing improved efficiency and higher margins, followed by process automation (64%) and expanding service offerings (44%) all with AI. With 75% viewing productivity as AI’s biggest opportunity, they are looking to embed AI capabilities throughout their organizations; completely altering and improving workflows where possible in order to stay ahead of competition. This can come in many forms, including changing workflows to make use of new AI functions integrated into existing software platforms without the need for changing tools or building new complicated and pricey AI tools from the ground up.

AI also enables deeper strategic work, taking menial tasks off consultants’ hands. Nearly half of respondents said AI could allow them to spend more time on creativity (47%) and consulting and strategy (44%), while 36% see value in generating more content at scale.

“By eliminating repetitive manual tasks, our team now focuses more on strategic thinking, product development, and high-value client work, whilst the system handles data gathering, research compilation, and initial content structuring, enabling us to scale our operations without proportionally increasing headcount,” stated Greg Radford, Head of Agency and Marketing, Constructiv Digital in Duda’s new 2026 Agency Growth Operating Framework ebook.

AI’s Questionable Quality

When it comes to the quality of AI output, there are mixed reviews. While 53% believe AI can drive higher-quality output, at the same time, 64% cite “AI slop,” namely low-quality content generated by AI, as their top concern. In fact, AI slop and fatigue is driving tech companies to hire more writers, editors and chief communications officers to write compelling stories.


Concerns around AI extended beyond content quality. Half of respondents (50%) worry about clients misunderstanding it, while 28% fear downward price pressure and 25% cite ethical concerns. Notably, 31% of respondents are concerned about being replaced by AI, underscoring ongoing anxiety around automation and the future role of agencies.

Traditional Search Remains a Priority


Proclamations of ‘SEO is Dead’ from last summer have long gone silent. While AIO is increasingly important, the survey reveals that traditional SEO remains the top priority for 2026 (43%), outpacing AI search optimization (30%). However, respondents increasingly view AI search as an opportunity in 2026, with 33% citing increased visibility and traffic through AI search as a potential upside. Despite widespread discussion around AI-driven search disruption, only 28% are concerned about decreased visibility or traffic due to AI.

Agent-to-Agent Commerce


Nearly 70% of respondents believe it will be very or somewhat important for brands to optimize for AI agent-to-agent shopping in 2026. Nevertheless, preparedness remains uneven: 27% are taking no steps, 25% are still planning, 22% are enhancing product metadata for discoverability, and 14% are optimizing product listings specifically for AI agents. Agencies still have time to prepare, as reports such as the 2026 IBM Institute for Business Value study reveal that consumers are leveraging genAI in their buying journey, but not diving into agent-to-agent shopping yet.

AI is not a question of ‘if,’ but ‘how’ for agencies in 2026, as they lead the way for strong AI adaptation and implementation for their clients. As they shift value propositions, work smarter and automate routine work, clients can expect to see more strategy, creativity, and smart consulting. However agencies who want to build for a long-term future will look beyond productivity to building new value propositions for their clients; going from efficiency to unlimited possibilities.

The survey was conducted between November and December 2025 among 36 Duda customers, primarily small agencies, "including agencies located in North America (61%), Europe (22%), Asia (14%), and the Middle East (3%)" with 89% employing 10 or fewer staff.

Reviewed by Asim BN.

Read next: 

• How Do Algorithms Work? Experts at Status Labs Weigh In


by Guest Contributor via Digital Information World

‘Probably’ doesn’t mean the same thing to your AI as it does to you

Mayank Kejriwal, University of Southern California

When a human says an event is “probable” or “likely,” people generally have a shared, if fuzzy, understanding of what that means. But when an AI chatbot like ChatGPT uses the same word, it’s not assessing the odds the way we do, my colleagues and I found.

We recently published a study in the journal NPJ Complexity that suggests that, while large language model AIs excel at conversation, they often fail to align with humans when communicating uncertainty. The research focused on words of estimative probability, which include terms like “maybe,” “probably” and “almost certain.”

By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models. While the models do tend to agree with humans on extremes like “impossible,” they diverge sharply on hedge words like “maybe.” For example, a model might use the word “likely” to represent an 80% probability, while a human reader assumes it means closer to 65%.

This could be because humans can interpret words such as “likely” and “probable” based more on contextual cues and personal experiences. In contrast, large language models may be averaging over conflicting usages of those words in their training data, leading to divergences with human interpretations.

Our study also found that large language models are sensitive to gendered language and the specific language used for prompting. When a prompt changed from “he” to “she,” the AI’s probability estimates often became more rigid, reflecting biases embedded in its training data. When a prompt changed from English to Chinese, the AI’s probability estimates often shifted, possibly due to differences between English and Chinese in how people express and understand uncertainty.

Image: Mirella Callage / unsplash

Why it matters

Far from being a linguistic quirk, this misalignment is a fundamental challenge for AI safety and human-AI interaction. As large language models are increasingly used in high-stakes fields like health care, government policy and scientific reporting, the way they communicate risk becomes a matter of public trust.

If an AI assistant helping a doctor, for instance, describes a side effect as “unlikely,” but the model’s internal calculation of “unlikely” is much higher than the doctor’s interpretation, the resulting decision could be flawed.

What other research is being done

Scientists have studied how humans quantify uncertainty since the 1960s, a field pioneered by CIA analysts to improve intelligence reporting. More recently, there has been an explosion in large language model literature seeking to look under the hood of neural networks to better understand their “behaviors” and linguistic patterns.

Our study adds a layer of complexity by treating the interaction between humans and artificial intelligence as a biological-like system where meaning can degrade. It moves beyond simply measuring if an AI is “smart” and instead asks if it is aligned.

Other researchers are currently exploring whether so-called chain-of-thought prompting – asking the AI to show its work – can fix these errors. However, our study found that even advanced reasoning doesn’t always bridge the gap between statistical data and verbal labels.

What’s next

A goal for future AI development is to create models that don’t just predict the next likely word but actually understand the weight of the uncertainty they are conveying. Researchers are calling for more robust consistency metrics to ensure that if a model sees a 10% chance in the data, it chooses the same word every time.

As we move toward a world where AI summarizes scientific papers and manages people’s schedules, making sure that “probably” means “probably” is a vital step in making these systems reliable partners rather than just sophisticated parrots.

The Research Brief is a short take on interesting academic work.The Conversation

Mayank Kejriwal, Research Assistant Professor of Industrial & Systems Engineering, University of Southern California

Disclosure statement: Mayank Kejriwal receives funding from the Defense Advanced Research Projects Agency and the National Institutes of Health. 
Partners: University of Southern California University of Southern California provides funding as a member of The Conversation US.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Reviewed by Ayaz Khan.

Read next: 

• AI energy use: New tools show which model consumes the most power, and why

‘I think I have AI anxiety’


by External Contributor via Digital Information World

Tuesday, February 24, 2026

‘I think I have AI anxiety’

By Associate Professor Grant Blashki , University of Melbourne

From job fears to an existential dread, artificial intelligence is triggering a new kind of anxiety. Here's why you're not alone in feeling unsettled – and what actually helps

Image: Zach M / Unsplash

A patient said to me the other day, half-smiling but clearly unsettled: “I think I’ve got anxiety about AI.”

They weren’t having a panic attack or describing clinical anxiety. What they were expressing was a persistent sense of unease that many of us are feeling right now.

A sense that the world is changing very quickly, that the systems we live within are being redesigned around us and that most of us don’t feel particularly consulted or prepared for a life increasingly immersed in artificial intelligence (AI).

If you’re feeling that way, you’re not alone.

A recent survey shows that the general public is more concerned about AI than AI experts – particularly around jobs and human connection – while both groups express strong concern about misinformation.

AI anxiety isn’t a single fear. It’s a cluster of related concerns that vary by a person’s stage of life, technical literacy, work and values. However, many of these worries tend to fall into a few common themes.

Fear of economic and identity disruption

Understandably, one of the biggest concerns is job disruption.

AI is increasingly capable of performing tasks that were once the preserve of humans – drafting text, analysing data, writing code, summarising meetings, interpreting images, handling customer interactions and even helping me fix my barbecue with what appears to be impressive competence.

Whether AI will ‘take our jobs’ is a more complex question.

The International Monetary Fund estimates that almost 40 per cent of jobs globally will be affected by AI, with advanced economies facing higher exposure because more work involves cognitive tasks.

While the World Economic Forum’s Future of Jobs Report 2025 projects substantial labour-market churn by 2030, with 170 million jobs created and 92 million displaced, resulting in net growth but major transition pressures.

Although these projections suggest overall job growth, others suggests AI-driven job losses, especially for young people. Regardless, the lived experience for many people is often one of disruption – local, personal and immediate.

For many of us, work provides more than income. It provides identity, purpose and social connection. Anxiety arises not only from fear of unemployment, but from uncertainty about relevance and value in a world increasingly shaped by AI.

Loss of control

A second major concern is what might be called the ‘Big Brother’ effect – the growing role of AI systems in informing, and sometimes making, decisions that affect people’s lives.

These include hiring, credit, insurance, welfare compliance and even healthcare prioritisation.

The worry is not simply that AI systems may be wrong. It’s that decisions may be opaque, difficult to challenge and poorly explained – effectively occurring inside a black box.

Although the Organisation for Economic Co-operation and Development’s (OECD) AI Principles explicitly emphasise human agency and oversight, transparency and accountability as core requirements for trustworthy AI systems, global trends suggest that guardrails are often weakened as companies and nations race for dominance in a competitive AI market.

Misinformation and manipulation

AI has dramatically lowered the cost of producing highly realistic – but entirely false – content. While this can be entertaining at times, it becomes serious when convincing images, audio and video are used to influence people’s decisions.

Deepfake technology is now used for fraud, impersonation and misinformation.

Recent reporting by The Guardian described deepfake scams as occurring on an “industrial scale”, with high-quality fake content accessible to non-experts.

In Australia, this concern has become concrete.

ABC News reported deepfake advertisements impersonating a leading diabetes specialist, promoting unproven supplements and discouraging evidence-based treatment – a clear public-health risk.

When people can no longer reliably distinguish authentic information from synthetic content, trust in institutions, expertise and online information degrades – and anxiety follows.

Privacy and surveillance

Privacy has been gradually eroding for years, and this is amplified by AI’s ability to analyse large volumes of personal data – including behavioural, biometric and location data – often without us fully understanding how that data is used.

Pew Research shows persistent public concern about data misuse, impersonation and loss of privacy associated with AI systems.

This anxiety is not limited to government surveillance; it also reflects unease about corporate data practices, profiling, targeted persuasion and information asymmetries between people and institutions.

AI agents ‘taking over’

For most people, fears of an AI apocalypse are not everyday concerns, but they surface during high-profile stories about AI behaving unpredictably or operating autonomously.

One recent example is Moltbook, a platform marketed as a social network for AI agents, which attracted widespread commentary about some weird and disturbing interactions between AI systems.

Reuters reported that the platform had a major security vulnerability that exposed private messages, thousands of email addresses, and over a million credentials, highlighting basic governance and security failures.

These episodes often attract dystopian interpretations reminiscent of science-fiction narratives. But the more immediate risks tend to be practical rather than dramatic: poor security, weak oversight, unclear responsibility and premature deployment.

Concentration of power

Another source of anxiety is the concentration of AI capability among a small number of firms and countries. Many people worry about a future in which a handful of technology giants hold disproportionate power and wealth.

The OECD has noted that generative AI markets may exhibit a ‘winner takes all or most’ dynamic, reinforcing market power and potentially increasing inequality.

When powerful technologies are perceived as unavoidable and foundational, people reasonably ask who benefits, who bears the risk and how accountability is maintained.

Education integrity under pressure

AI anxiety is particularly evident in education.

The concern is not only academic misconduct, but whether assessment continues to measure understanding, reasoning and learning when high-quality outputs can be generated instantly.

A recent article in The Australian reported widespread student use of AI in higher education, including an experiment in which around 80 per cent of 40 student assignments had a high probability of being AI-generated.

This is a global issue.

The United Nations Educational, Scientific and Cultural Organization (UNESCO) has warned that generative AI is advancing faster than institutional readiness, raising concerns about privacy, equity and the long-term future of education.

At its core, this anxiety reflects concern about the purpose of education itself – not just credentialing, but the development of judgment, critical thinking and intellectual independence.

Authenticity and meaning

Finally, there is a quieter concern about authenticity and meaning.

As generative AI becomes capable of producing fluent writing, images and conversation at scale, some people worry less about being replaced and more about being diminished.

They question whether human creativity, effort and connection will continue to be recognised and valued when machine-generated output is ubiquitous.

Research from the Pew Research Center captures this unease.

Many people express concern that AI may reduce human interaction, weaken social connection and devalue human skills and creativity, even while acknowledging its potential benefits.

These concerns are not anti-technology; they reflect a desire for a future in which human contribution remains visible and meaningful alongside increasingly capable machines.

What actually helps if you’re worried about AI

When people raise anxiety with me – whether it’s about their health, the environment or technology – it usually eases once concerns become specific and actionable. AI is no different.

So, here are some tips:

  1. Name the worry: ‘AI’ is too broad to be useful. Are you worried about your job, misinformation, privacy, education or decision-making without oversight?
  2. Clean up your information diet: AI anxiety is often driven by headlines rather than evidence. Limit sensationalist coverage, be cautious with viral screenshots and rely on a small number of trusted sources.
  3. Build your AI literacy: You don’t need to be technical, but you do need to understand how AI is used in your own field – where it helps, where it fails and how outputs should be checked.
  4. Ask for guardrails: Anxiety rises when accountability is unclear. Ask who is responsible when AI is used, how errors are handled and what safeguards exist. Support regulation – even at your workplace or in your home – that focuses on transparency, safety and fairness.

At this moment in history, AI anxiety is not irrational. It reflects rapid change intersecting with our work, education, relationships and identity.

Neither denial nor panic is helpful. Engagement, understanding and our shared responsibility are.

Note: This article was originally published on the University of Melbourne’s Pursuit research news website and republished on DIW with permission. We have been informed that no AI tools were used in creating the text.

Reviewed by Ayaz Khan.

Read next: AI energy use: New tools show which model consumes the most power, and why


by External Contributor via Digital Information World

AI energy use: New tools show which model consumes the most power, and why

Reviewed by Ayaz Khan.

Software can help developers assess AI model energy use, which is necessary to lower costs and strain on the grid

Image: Brett Sayles / Pexels

AI users and developers can now measure the amount of electricity various AI models consume to complete tasks with open-source software and an online leaderboard developed at the University of Michigan.

Companies can download the software to evaluate private models run on private hardware. And while the software can’t evaluate the energy costs of queries run on proprietary AI models at private data centers, it has allowed U-M engineers to measure the power used by open-weight AI models in which the parameters under the hood are publicly available. The power requirements can be viewed on an online leaderboard, which was updated this month. Their results have revealed trends on how AI energy use varies with model design and implementation.

“If you want to optimize energy efficiency and minimize environmental impact, knowing the energy requirements of the models is critical, but popular benchmarks for assessing AI ignore this aspect of performance,” said Mosharaf Chowdhury, associate professor of computer science and engineering and the corresponding author of a study describing the software.

Tools for informed decision-making

The researchers measured energy use across several different tasks, including chatting, video and image generation, problem solving and coding. For some tasks, the energy requirements of open-weight models can vary by a factor of 300. With the results, Chowdhury’s team has developed tutorials for developers to learn how to measure and lower the energy costs of their models. They gave their latest tutorial at the Neural Information Processing Systems (NeurIPS) Conference in December.

The researchers designed their software with partial funding from the National Science Foundation to help solve AI’s growing energy demands. Between 80% and 90% of the sector’s energy is consumed when a trained model processes a request at remote data centers—what the industry calls inference.

As AI models grow in size and are used more often, they need more power. Data centers in the United States consumed about 4% of the country’s total power in 2024—or about as much as Pakistan uses in a year. Data centers are projected to use twice as much power by 2030, according to a study by the Pew Research Center. But many estimates on AI growth rely on ‘envelope’ calculations, which are made by multiplying the maximum power draw per GPU by the number of GPUs. It’s only an estimate of the highest possible energy cost.

“A lot of people are concerned about AI’s growing energy use, which is fair,” Chowdhury said. “However, many who worry can be overly pessimistic, and those who want more data centers are often overly optimistic. The reality is not black and white, and there’s a lot we don’t know because nobody is making direct measurements of AI power use available. Our tool can provide more accurate data for better decision-making.”

Why do some AI models use more power?

The team’s assessments of open-weight models revealed larger trends in how an AI’s design could affect its energy requirements. A key factor was the number of generated tokens—the basic units of data processed by AI. In LLMs, tokens are pieces of words, so wordier models tend to use more energy than concise models. Problem-solving or reasoning models also use more energy because they generate “chains of thought” that contain 10 to 100 more tokens per request.

But the energy requirement of even a single model can change, depending on how it’s run at the data center. Processing queries in batches, for example, will result in less energy use at the data center overall, although larger batches take longer to run. The choice of software for allocating computer memory to queries can also impact AI’s energy requirements.

“There are many ways to deploy AI and translate what the model wants to do into computations on the hardware,” said Jae-Won Chung, U-M doctoral student in computer science and engineering and the study’s first author. “Our tool can automate the search through that parameter space and find the most efficient set of parameters based on the user’s needs.”

The research was also supported by grants and gifts from VMware, the Mozilla Foundation, Cisco, Ford, GitHub, Salesforce, Google and the Kwanjeong Educational Foundation.

Contact: Kate McAlpine.

This post was originally published by the University of Michigan News and is republished here with permission.

Read next: 

AI chatbots provide less-accurate information to vulnerable users, study

• How shaming unethical brands makes companies improve their behaviour
by External Contributor via Digital Information World

How shaming unethical brands makes companies improve their behaviour

Janet Godsell, Loughborough University and Nikolai Kazantsev, University of Cambridge

Recent investigations have uncovered forced labour in agricultural supply chains, illegal fishing feeding supermarket freezers, deforestation embedded in everyday food products, and unsafe conditions in factories producing “sustainable” fashion. These harms were not visible on labels. They surfaced only when journalists, whistleblowers or activists exposed them.

Image: Atoms / Unsplash

And when they did, something predictable happened. Consumers felt uneasy. Brands issued statements. Promises were made. The point is that the force that set change in motion was not regulation. It was consumers.

Discovering that an ordinary purchase may be tied to exploitation or environmental damage creates a jolt of personal responsibility. In our research, we found that when environmental consequences are clearly linked to people’s own buying choices, many are willing to switch products — especially when credible alternatives exist.

But guilt is private. It nudges personal behaviour. It does not automatically reshape systems. The shift happens when private discomfort becomes public voice.

Consumers are often also the first to make hidden environmental harms visible. They post evidence on social media. They question corporate claims. They compare sustainability promises with independent reporting. They organise petitions, boycotts and review campaigns. By shining a spotlight on the truth, the scrutiny shifts from shoppers to brands.

That shift matters because modern brands depend on trust. Reputation is an asset. When sustainability claims are publicly challenged, credibility is at risk. Research in organisational behaviour shows that firms respond quickly to threats to legitimacy. Reputational damage affects customer loyalty, investor confidence and regulatory attention.

In many high-profile cases, supply chain reforms have followed intense public scrutiny rather than quiet compliance checks. Leaders may not act out of moral awakening — but they do act when inaction becomes costly to their reputation.

Consumers can trigger the emotional chain reaction. They feel guilt. They seek information. They speak collectively. That collective voice generates corporate shame.

Sustainability professor Mike Berners-Lee argues in his book A Climate of Truth that demanding honesty is one of the most powerful climate actions available to citizens. Raising standards of truthfulness in business and media changes incentives. When the gap between what companies say and what they do becomes visible, maintaining that gap becomes harder.

Our research explores how that visibility can be strengthened. The findings were clear. When environmental and social consequences are personalised and traceable, sustainability feels less distant. People see both their own role and the role of particular firms. That dual awareness encourages two responses: behavioural change driven by guilt and corporate accountability driven by shame.

Shame works because it is social. Brands care about how they are seen. When the negative environmental and social effects of supply chains can be publicly connected to named products, corporate narratives become contestable in real time.

Making supply chains socially visible

The technology to improve transparency already exists. Companies track goods through logistics systems, supplier databases and digital product-tagging that collect detailed information about sourcing and production. The barrier is not data collection. It is disclosure.

Environmental indicators — carbon emissions, water use, land conversion risk, labour standards compliance — can be linked to products through QR codes or retail apps. Comparable reporting standards would ensure consistency. Simple digital interfaces would make information accessible. Social sharing tools would allow consumers to compare and discuss findings publicly.

Social media is crucial. It already enables workers, communities and campaigners to challenge corporate messaging. Integrating verified supply chain data into these spaces would shift transparency from crisis response to everyday expectation.

This strategy, with its behaviour change directive, could work more effectively than rules or green marketing campaigns alone.

Regulation is essential but often slow and uneven across borders. Marketing campaigns can highlight selective improvements while leaving deeper practices untouched. Transparency activated by collective consumer voice operates differently. It aligns emotional motivation with reputational consequence.

Consumers are not passive recipients of information. They are catalysts. By feeling the first twinge of guilt, asking harder questions and speaking together, they create the conditions under which companies experience shame. When shame threatens trust and market position, change becomes rational and inevitable.

Shame is uncomfortable. But when directed at opaque systems rather than consumers, it can be powerful. By demanding truth and making supply chains socially visible, consumers can push businesses towards greater transparency — and, ultimately, towards more sustainable practice.


Janet Godsell, Dean and Professor of Operations and Supply Chain Strategy, Loughborough Business School, Loughborough University and Nikolai Kazantsev, Postdoctoral Researcher, Institute for Manufacturing, University of Cambridge

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Edited by Asim BN. Reviewed by Ayaz Khan.

Read next: Study: AI chatbots provide less-accurate information to vulnerable users


by External Contributor via Digital Information World

Saturday, February 21, 2026

Study: AI chatbots provide less-accurate information to vulnerable users

By Media Lab | MIT News

Research from the MIT Center for Constructive Communication finds leading AI models perform worse for users with lower English proficiency, less formal education, and non-US origins.

Large language models (LLMs) have been championed as tools that could democratize access to information worldwide, offering knowledge in a user-friendly interface regardless of a person’s background or location. However, new research from MIT’s Center for Constructive Communication (CCC) suggests these artificial intelligence systems may actually perform worse for the very users who could most benefit from them.

A study conducted by researchers at CCC, which is based at the MIT Media Lab, found that state-of-the-art AI chatbots — including OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3 — sometimes provide less-accurate and less-truthful responses to users who have lower English proficiency, less formal education, or who originate from outside the United States. The models also refuse to answer questions at higher rates for these users, and in some cases, respond with condescending or patronizing language.

“We were motivated by the prospect of LLMs helping to address inequitable information accessibility worldwide,” says lead author Elinor Poole-Dayan SM ’25, a technical associate in the MIT Sloan School of Management who led the research as a CCC affiliate and master’s student in media arts and sciences. “But that vision cannot become a reality without ensuring that model biases and harmful tendencies are safely mitigated for all users, regardless of language, nationality, or other demographics.”

A paper describing the work, “LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users,” was presented at the AAAI Conference on Artificial Intelligence in January.

Systematic underperformance across multiple dimensions

For this research, the team tested how the three LLMs responded to questions from two datasets: TruthfulQA and SciQ. TruthfulQA is designed to measure a model’s truthfulness (by relying on common misconceptions and literal truths about the real world), while SciQ contains science exam questions testing factual accuracy. The researchers prepended short user biographies to each question, varying three traits: education level, English proficiency, and country of origin.

Across all three models and both datasets, the researchers found significant drops in accuracy when questions came from users described as having less formal education or being non-native English speakers. The effects were most pronounced for users at the intersection of these categories: those with less formal education who were also non-native English speakers saw the largest declines in response quality.

The research also examined how country of origin affected model performance. Testing users from the United States, Iran, and China with equivalent educational backgrounds, the researchers found that Claude 3 Opus in particular performed significantly worse for users from Iran on both datasets.

“We see the largest drop in accuracy for the user who is both a non-native English speaker and less educated,” says Jad Kabbara, a research scientist at CCC and a co-author on the paper. “These results show that the negative effects of model behavior with respect to these user traits compound in concerning ways, thus suggesting that such models deployed at scale risk spreading harmful behavior or misinformation downstream to those who are least able to identify it.”

Refusals and condescending language

Perhaps most striking were the differences in how often the models refused to answer questions altogether. For example, Claude 3 Opus refused to answer nearly 11 percent of questions for less educated, non-native English-speaking users — compared to just 3.6 percent for the control condition with no user biography.

When the researchers manually analyzed these refusals, they found that Claude responded with condescending, patronizing, or mocking language 43.7 percent of the time for less-educated users, compared to less than 1 percent for highly educated users. In some cases, the model mimicked broken English or adopted an exaggerated dialect.

The model also refused to provide information on certain topics specifically for less-educated users from Iran or Russia, including questions about nuclear power, anatomy, and historical events — even though it answered the same questions correctly for other users.

“This is another indicator suggesting that the alignment process might incentivize models to withhold information from certain users to avoid potentially misinforming them, although the model clearly knows the correct answer and provides it to other users,” says Kabbara.

Echoes of human bias

The findings mirror documented patterns of human sociocognitive bias. Research in the social sciences has shown that native English speakers often perceive non-native speakers as less educated, intelligent, and competent, regardless of their actual expertise. Similar biased perceptions have been documented among teachers evaluating non-native English-speaking students.

“The value of large language models is evident in their extraordinary uptake by individuals and the massive investment flowing into the technology,” says Deb Roy, professor of media arts and sciences, CCC director, and a co-author on the paper. “This study is a reminder of how important it is to continually assess systematic biases that can quietly slip into these systems, creating unfair harms for certain groups without any of us being fully aware.”

The implications are particularly concerning given that personalization features — like ChatGPT’s Memory, which tracks user information across conversations — are becoming increasingly common. Such features risk differentially treating already-marginalized groups.

“LLMs have been marketed as tools that will foster more equitable access to information and revolutionize personalized learning,” says Poole-Dayan. “But our findings suggest they may actually exacerbate existing inequities by systematically providing misinformation or refusing to answer queries to certain users. The people who may rely on these tools the most could receive subpar, false, or even harmful information.”

Reprinted with permission of MIT News.

Image: Tara Winstead / Pexels

Reviewed by Irfan Ahmad.

Read next: Most AI Bots Lack Published Formal Safety and Evaluation Documents, Study Finds
by External Contributor via Digital Information World

Friday, February 20, 2026

Most AI Bots Lack Published Formal Safety and Evaluation Documents, Study Finds

Story: Fred Lewsey.

Reviewed by Ayaz Khan.

An investigation into 30 top AI agents finds just four have published formal safety and evaluation documents relating to the actual bots.

Many of us now use AI chatbots to plan meals and write emails, AI-enhanced web browsers to book travel and buy tickets, and workplace AI to generate invoices and performance reports.

However, a new study of the “AI agent ecosystem” suggests that as these AI bots rapidly become part of everyday life, basic safety disclosure is “dangerously lagging”.

A research team led by the University of Cambridge has found that AI developers share plenty of data on what these agents can do, while withholding evidence of the safety practices needed to assess any risks posed by AI.

The AI Agent Index, a project that includes researchers from MIT, Stanford and the Hebrew University of Jerusalem, investigated the abilities, transparency and safety of thirty “state of the art” AI agents, based on public information and correspondence with developers.

The latest update of the Index is led by Leon Staufer, a researcher studying for an MPhil at Cambridge’s Leverhulme Centre for the Future of Intelligence. It looked at available data for a range of leading chat, browser and workflow AI bots built mainly in the US and China.

The team found a “significant transparency gap”. Developers of just four AI bots in the Index publish agent-specific “system cards”: formal safety and evaluation documents that cover everything from autonomy levels and behaviour to real-world risk analyses.*

Additionally, 25 out of 30 AI agents in the Index do not disclose internal safety results, while 23 out of 30 agents provide no data from third-party testing, despite these being the empirical evidence needed to rigorously assess risk.

Known security incidents or concerns have only been published for five AI agents, while “prompt injection vulnerabilities” – when malicious instructions manipulate the agent into ignoring safeguards – are documented for two of those agents.

Of the five Chinese AI agents analysed for the Index, only one had published any safety frameworks or compliance standards of any kind.

“Many developers tick the AI safety box by focusing on the large language model underneath, while providing little or no disclosure about the safety of the agents built on top,” said Cambridge University’s Leon Staufer, lead author of the Index update.

“Behaviours that are critical to AI safety emerge from the planning, tools, memory, and policies of the agent itself, not just the underlying model, and very few developers share these evaluations.”

Most AI Developers Do Not Publish Safety and Evaluation Documents for Their AI Bots
Image: The 2025 AI Agent Index. For 198 out of 1,350 fields, no public information was found. Missing information is concentrated in 'Ecosystem Interaction' and 'Safety' categories. Only 4 agents provide agent-specific system cards.

In fact, the researchers identify 13 AI agents that exhibit “frontier levels” of autonomy, yet only four of these disclose any safety evaluations of the bot itself.

“Developers publish broad, top-level safety and ethics frameworks that sound reassuring, but are publishing limited empirical evidence needed to actually understand the risks,” Staufer said.

“Developers are much more forthcoming about the capabilities of their AI agent. This transparency asymmetry suggests a weaker form of safety washing.”

The latest annual update provides verified information across 1,350 fields for the thirty prominent AI bots, as available up to the last day of 2025.

Criteria for featuring in the Index included public availability and ease of use, and developers with a market valuation of over US$1 billion. Some 80% of the Index bots were released or had major updates in the last two years.

The Index update shows that – outside of Chinese AI bots – almost all agents depend on a few foundation models (GPT, Claude, Gemini), a significant concentration of platform power behind the AI revolution, as well as potential systemic choke points.

Also read: Generative AI has seven distinct roles in combating misinformation

“This shared dependency creates potential single points of failure,” said Staufer. “A pricing change, service outage, or safety regression in one model could cascade across hundreds of AI agents. It also creates opportunities for safety evaluations and monitoring.”

Many of the least transparent agents are AI-enhanced web browsers designed to carry out tasks on the open web on a user’s behalf: clicking, scrolling, and filling in forms for tasks ranging from buying limited-release tickets to monitoring eBay bids.

Browser agents have the highest rate of missing safety information: 64% of safety-related fields unreported. They also operate at the highest levels of autonomy.**

This is closely followed by enterprise agents, business management AI aimed at reliably automating work tasks, with 63% of safety-related fields missing. Chat agents are missing 43% of safety-related fields in the Index.***

Staufer points out that there are no established standards for how AI agents should behave on the web. Most agents do not disclose their AI nature to end users or third parties by default.****Only three agents support watermarking of generated media to identify it as from AI.

At least six AI agents in the Index explicitly use types of code and IP addresses designed to mimic human browsing behaviour and bypass anti-bot protections.

“Website operators can no longer distinguish between a human visitor, a legitimate agent, and a bot scraping content,” said Staufer. “This has significant implications for everything from online shopping and form-filling to booking services and content scraping.”

The update includes a case study on Perplexity Comet: one of the most autonomous browser-based AI agents in the Index, as well as one of the most high-risk and least transparent.

Comet is marketed on its ability to “work just like a human assistant”. Amazon has already threatened legal action over Comet not identifying itself as an AI agent when interacting with its services.

“Without proper safety disclosures, vulnerabilities may only come to light when they are exploited,” said Staufer.

“For example, browser agents can act directly in the real world by making purchases, filling in forms, or accessing accounts. This means that the consequences of a security flaw can be immediate and far-reaching.”

Staufer points out that last year, security researchers discovered that malicious content on a webpage could hijack a browser agent into executing commands, while other attacks were able to extract users' private data from connected services.

Added Staufer: “The latest AI Agent Index reveals the widening gap between the pace of deployment and the pace of safety evaluation. Most developers share little information about safety, evaluations, and societal impacts.”

“AI agents are getting more autonomous and more capable of acting in the real world, but the transparency and governance frameworks needed to manage that shift are dangerously lagging.”


by External Contributor via Digital Information World