Tuesday, December 30, 2025

AI agents arrived in 2025 – here’s what happened and the challenges ahead in 2026

Image: DIW-Aigen

In artificial intelligence, 2025 marked a decisive shift. Systems once confined to research labs and prototypes began to appear as everyday tools. At the center of this transition was the rise of AI agents – AI systems that can use other software tools and act on their own.

While researchers have studied AI for more than 60 years, and the term “agent” has long been part of the field’s vocabulary, 2025 was the year the concept became concrete for developers and consumers alike.

AI agents moved from theory to infrastructure, reshaping how people interact with large language models, the systems that power chatbots like ChatGPT.

In 2025, the definition of AI agent shifted from the academic framing of systems that perceive, reason and act to AI company Anthropic’s description of large language models that are capable of using software tools and taking autonomous action. While large language models have long excelled at text-based responses, the recent change is their expanding capacity to act, using tools, calling APIs, coordinating with other systems and completing tasks independently.

This shift did not happen overnight. A key inflection point came in late 2024, when Anthropic released the Model Context Protocol. The protocol allowed developers to connect large language models to external tools in a standardized way, effectively giving models the ability to act beyond generating text. With that, the stage was set for 2025 to become the year of AI agents.

AI agents are a whole new ballgame compared with generative AI.

The milestones that defined 2025

The momentum accelerated quickly. In January, the release of Chinese model DeepSeek-R1 as an open-weight model disrupted assumptions about who could build high-performing large language models, briefly rattling markets and intensifying global competition. An open-weight model is an AI model whose training, reflected in values called weights, is publicly available. Throughout 2025, major U.S. labs such as OpenAI, Anthropic, Google and xAI released larger, high-performance models, while Chinese tech companies including Alibaba, Tencent, and DeepSeek expanded the open-model ecosystem to the point where the Chinese models have been downloaded more than American models.

Another turning point came in April, when Google introduced its Agent2Agent protocol. While Anthropic’s Model Context Protocol focused on how agents use tools, Agent2Agent addressed how agents communicate with each other. Crucially, the two protocols were designed to work together. Later in the year, both Anthropic and Google donated their protocols to the open-source software nonprofit Linux Foundation, cementing them as open standards rather than proprietary experiments.

These developments quickly found their way into consumer products. By mid-2025, “agentic browsers” began to appear. Tools such as Perplexity’s Comet, Browser Company’s Dia, OpenAI’s GPT Atlas, Copilot in Microsoft’s Edge, ASI X Inc.’s Fellou, MainFunc.ai’s Genspark, Opera’s Opera Neon and others reframed the browser as an active participant rather than a passive interface. For example, rather than helping you search for vacation details, it plays a part in booking the vacation.

At the same time, workflow builders like n8n and Google’s Antigravity lowered the technical barrier for creating custom agent systems beyond what has already happened with coding agents like Cursor and GitHub Copilot.

New power, new risks

As agents became more capable, their risks became harder to ignore. In November, Anthropic disclosed how its Claude Code agent had been misused to automate parts of a cyberattack. The incident illustrated a broader concern: By automating repetitive, technical work, AI agents can also lower the barrier for malicious activity.

This tension defined much of 2025. AI agents expanded what individuals and organizations could do, but they also amplified existing vulnerabilities. Systems that were once isolated text generators became interconnected, tool-using actors operating with little human oversight.

The business community is gearing up for multiagent systems.

What to watch for in 2026

Looking ahead, several open questions are likely to shape the next phase of AI agents.

One is benchmarks. Traditional benchmarks, which are like a structured exam with a series of questions and standardized scoring, work well for single models, but agents are composite systems made up of models, tools, memory and decision logic. Researchers increasingly want to evaluate not just outcomes, but processes. This would be like asking students to show their work, not just provide an answer.

Progress here will be critical for improving reliability and trust, and ensuring that an AI agent will perform the task at hand. One method is establishing clear definitions around AI agents and AI workflows. Organizations will need to map out exactly where AI will integrate into workflows or introduce new ones.

Another development to watch is governance. In late 2025, the Linux Foundation announced the creation of the Agentic AI Foundation, signaling an effort to establish shared standards and best practices. If successful, it could play a role like the World Wide Web Consortium in shaping an open, interoperable agent ecosystem.

There is also a growing debate over model size. While large, general-purpose models dominate headlines, smaller and more specialized models are often better suited to specific tasks. As agents become configurable consumer and business tools, whether through browsers or workflow management software, the power to choose the right model increasingly shifts to users rather than labs or corporations.

The challenges ahead

Despite the optimism, significant socio-technical challenges remain. Expanding data center infrastructure strains energy grids and affects local communities. In workplaces, agents raise concerns about automation, job displacement and surveillance.

From a security perspective, connecting models to tools and stacking agents together multiplies risks that are already unresolved in standalone large language models. Specifically, AI practitioners are addressing the dangers of indirect prompt injections, where prompts are hidden in open web spaces that are readable by AI agents and result in harmful or unintended actions.

Regulation is another unresolved issue. Compared with Europe and China, the United States has relatively limited oversight of algorithmic systems. As AI agents become embedded across digital life, questions about access, accountability and limits remain largely unanswered.

Meeting these challenges will require more than technical breakthroughs. It demands rigorous engineering practices, careful design and clear documentation of how systems work and fail. Only by treating AI agents as socio-technical systems rather than mere software components, I believe, can we build an AI ecosystem that is both innovative and safe.The Conversation

Thomas Şerban von Davier, Affiliated Faculty Member, Carnegie Mellon Institute for Strategy and Technology, Carnegie Mellon University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Editor's Note: This post might have been created or polished by AI tools.


by External Contributor via Digital Information World

Monday, December 29, 2025

Nobel Laureate Discusses Artificial Intelligence's Role in Critical Thinking Education

Nobel Prize-winning physicist Saul Perlmutter addressed artificial intelligence as a two-edged tool during a December 2025 interview on the "In Good Company" podcast, briefly comparing it to earlier concerns about calculators in education. Perlmutter, who won the Nobel Prize in Physics for discovering the universe's accelerating expansion, discussed how AI intersects with the critical thinking methods he teaches.

The physicist noted that AI can give students the impression they have actually learned the basics before they really have, potentially leading them to rely on it too soon before they know how to do the work themselves. He identified a particular concern with the current generation of AI being very good at being overly confident about what it's saying, which users may accept without scrutiny because it's typed on the screen.

Perlmutter teaches a critical thinking course covering 24 concepts and has asked students to think a bit hard about how to use AI to make it easier to operationalize each concept in their day-to-day lives, and also how to use these concepts to tell whether AI is fooling them or sending them in the right or wrong direction.

The physicist noted that when users know these different tools and approaches to thinking about problems, AI can often help them find the bit of information they need to use these techniques.

Notes: This post was drafted with the assistance of AI tools and reviewed, fact-checked, edited, and published by humans. Image: DIW-Aigen

Read next: AI Video Translation Offers Efficiency Potential but Human Nuance Remains Key
by Ayaz Khan via Digital Information World

AI Video Translation Offers Efficiency Potential but Human Nuance Remains Key

A study evaluated consumer responses to marketing videos translated by a generative AI tool (HeyGen) versus human translators across English–Indonesian and Indonesian–English language pairs. Two online experiments involved participants in Indonesia (Study 1) and the United States and United Kingdom (Study 2), measuring language comprehension, accent neutrality, naturalness, and customer engagement intention.

AI translations were consistently rated as less natural and less accent-neutral than human translations. Language comprehension varied by direction: AI performed worse translating into Indonesian but better into English, reflecting differences in AI training data. Despite these perceptual differences, viewers were equally willing to like, share, or comment on both types of videos.

Research shows AI struggles with tone and accents, though marketing engagement matches human translations. Thoughtful use of emerging technologies requires balancing innovation with responsibility, ensuring progress benefits people without misleading or harming them.

"These insights suggest that AI video translation is not yet a perfect substitute for human translation...", explains UEA in a newsroom post. Adding further, "But it already offers practical value".
According to Jiseon Han, Assistant Professor at University of East Anglia: "For [online] marketers, AI can be a great choice when speed and straightforward messaging matter most, but when it comes to capturing tone, personality, and cultural context, human expertise is still irreplaceable".

The authors note several limitations: findings reflect a single AI tool, specific language pairs, one video per condition, and a single point in time, which restricts generalizability. They suggest future research should explore additional AI tools, languages, and translation contexts to further understand consumer evaluation of AI video translation.

Source: Journal of International Marketing; research led by the University of Jyväskylä with co-authorship from University of East Anglia (UEA).

Notes: This post was drafted with the assistance of AI tools and reviewed, fact-checked, edited, and published by humans.

Read next: Global Survey: 66% Say 2025 Bad Year for Country, 71% Optimistic 2026 Will Be Better
by Asim BN via Digital Information World

Friday, December 26, 2025

Global Survey: 66% Say 2025 Bad Year for Country, 71% Optimistic 2026 Will Be Better

Ipsos surveyed 23,642 adults (under the age of 75) across 30 countries between 27 October and 4 November 2025. The survey found that 50% of respondents said 2025 was a bad year for them and their family. At the national level, 66% of respondents said 2025 was a bad year for their country, with the highest percentages reported in France (85%), South Korea (85%), and Türkiye (80%).

Looking ahead, 71% of respondents expressed optimism that 2026 will be better than 2025. Countries with the highest optimism included Indonesia (90%), Colombia (89%), and Chile (86%), while France (41%), Japan (44%), and Belgium (49%) reported the lowest optimism.

Public pessimism dominated 2025 globally, but strong optimism for 2026 emerges across emerging economies surveyed.

Country % agree % disagree
30-country avg. 71 29
Indonesia 90 10
Colombia 89 11
Chile 86 14
Thailand 86 14
Peru 86 14
India 85 15
Argentina 83 17
South Africa 82 18
Mexico 82 18
Malaysia 82 18
Brazil 80 20
Hungary 77 23
Poland 74 26
Romania 70 30
Canada 70 30
Spain 69 31
Sweden 68 32
Singapore 67 33
Netherlands 67 33
United States 66 34
Australia 66 34
South Korea 65 35
Türkiye 63 37
Ireland 63 37
Great Britain 58 42
Germany 57 43
Italy 57 43
Belgium 49 51
Japan 44 56
France 41 59

On economic expectations, 49% of respondents predicted a stronger global economy in 2026, while 51% expected it to be worse.

The report also notes that in 2020, 90% of average respondents globally said their country had a bad year, reflecting the height of the COVID-19 pandemic. Current optimism levels remain below pre-2022 figures.

Source: Ipsos Predictions 2026 Report

Read next:

• How Schema Markup Is Redefining Brand Visibility in the Age of AI Search, According to Experts at Status Labs

• How ChatGPT could change the face of advertising, without you even knowing about it
by Ayaz Khan via Digital Information World

Wednesday, December 24, 2025

How Schema Markup Is Redefining Brand Visibility in the Age of AI Search, According to Experts at Status Labs


The way brands are discovered, evaluated, and recommended has fundamentally changed. As AI platforms like ChatGPT, Google's Gemini, and Perplexity increasingly mediate the relationship between businesses and their audiences, the technical infrastructure behind digital reputation has become just as important as the content itself. At the center of this shift is schema markup, a structured data framework that serves as a translation layer between your digital presence and the AI systems now shaping public perception.

The Growing Importance of Machine-Readable Branding

When a potential customer, investor, or partner asks an AI assistant about your company, the response depends on whether that AI system can accurately identify, understand, and trust your brand. Unlike traditional search engines that present links for users to evaluate, AI platforms synthesize information and deliver direct answers. This creates a fundamental challenge: if your brand's information isn't structured in ways that AI systems can reliably interpret, you risk being misrepresented, conflated with competitors, or excluded from responses entirely.

According to research from Schema App, Microsoft's Fabrice Canel, Principal Product Manager at Bing, confirmed at SMX Munich in March 2025 that schema markup directly helps Microsoft's large language models understand web content. This represents one of the first official confirmations from a major AI platform that structured data influences how LLMs process and present information.

The implications extend beyond simple visibility. Studies indicate that pages with comprehensive schema implementation are significantly more likely to appear in AI-generated summaries. A benchmark study from Data World found* that LLMs grounded in knowledge graphs achieve 300% higher accuracy compared to those relying solely on unstructured data. For brands, this accuracy translates directly into reputation protection and opportunity capture.

Understanding Schema Markup as Digital Identity Infrastructure

Schema markup uses standardized vocabulary from Schema.org to explicitly label elements on web pages that AI systems prioritize: organizational information, reviews, author credentials, products, and services. Rather than forcing AI models to infer meaning from unstructured text, this structured data provides explicit signals about what your content represents and how different elements relate to each other.

Google's own documentation states that structured data helps search systems understand page content by providing explicit clues about meaning. This guidance has taken on new significance as Google's AI Overviews and Gemini increasingly rely on the Knowledge Graph, which is enriched by schema markup crawled from the web.

The digital reputation management firm Status Labs has emerged as a leading voice on this topic, developing comprehensive frameworks for how businesses should approach structured data in an AI-dominated landscape. Their research indicates that company websites optimized with Organization schema and connected entity markup represent the most controllable authoritative source for AI training data. As Status Labs explains in their detailed analysis of schema markup's role in AI reputation, implementing structured data that signals contextual relationships to AI platforms is essential for preventing entity confusion that damages digital reputation.

The Entity Recognition Challenge

One of the most significant reputation risks in the AI era involves entity recognition, the process by which AI platforms distinguish between concepts sharing identical names. When someone asks an AI assistant about your company, the system must determine whether you're the technology firm based in Austin or the manufacturing company with the same name in Ohio.

Without Organization schema establishing your company as a distinct legal entity with specific founding dates, locations, and verifiable credentials, AI systems may merge information about different organizations into a single, confused representation. This creates scenarios where achievements are attributed to competitors or negative information about unrelated entities appears in responses about your business.

Status Labs has documented cases where proper schema implementation resolved significant entity confusion issues. Their GEO (Generative Engine Optimization) practice focuses specifically on these challenges, helping clients establish clear digital identities that AI systems can accurately recognize and represent.

The "sameAs" property in Organization schema proves particularly valuable here, linking your official website to verified profiles on LinkedIn, Crunchbase, and other authoritative platforms. This creates a network of corroborating signals that AI systems use to validate your identity and distinguish you from similarly named entities.

Performance Data: Schema's Measurable Impact

Research from BrightEdge demonstrates that schema markup improves brand presence and perception in Google's AI Overviews, with higher citation rates observed on pages with robust structured data. A recent analysis** also found that 72% of sites appearing on Google's first page search results use schema markup, indicating a strong correlation between structured data and visibility.

The stakes have increased substantially as AI Overviews reduce traditional organic clicks*** by approximately 34.5% year-over-year. Businesses not appearing in AI-generated summaries face accelerating invisibility as users increasingly accept AI responses without clicking through to websites.

An AccuraCast study**** analyzing over 2,000 prompts across ChatGPT, Google AI Overviews, and Perplexity found that 81% of web pages receiving citations included schema markup. While correlation doesn't prove causation, the data suggests that structured data plays a meaningful role in determining which sources AI platforms reference. Notably, ChatGPT showed particular preference for Person schema, with 70.4% of cited sources including this markup type, reflecting the platform's emphasis on source authority and reliability.

Critical Schema Types for Reputation Management

Different schema types serve distinct reputation management functions. Understanding which to prioritize depends on your specific visibility and protection goals.

Organization Schema consolidates business information into formats that AI platforms trust. This includes legal name, logo, founding date, official addresses, contact information, and social media profiles. Status Labs' detailed analysis outlines how implementing a comprehensive Organization schema across all digital properties creates the foundation for accurate AI representation.

Person Schema prevents the misattribution that damages executive and professional reputation. When multiple individuals share identical names, this markup defines biographical information, professional credentials, affiliations, and accomplishments, distinguishing separate careers and ensuring accurate attribution.

Review and AggregateRating Schema directly impact AI trustworthiness assessments. AI systems weigh verified customer feedback heavily when generating recommendations. Properly structured review markup must match visible page content exactly, as AI platforms detect and penalize mismatched data.

Article and BlogPosting Schema establish content authority and topical expertise. These schemas identify authors, publication dates, and subject matter, helping AI systems attribute information correctly and recognize your organization as an authoritative voice on specific topics.

Building Connected Knowledge Graphs

Basic schema provides value, but connected schema creates compounding advantages. As Search Engine Journal reports, enterprises are increasingly viewing structured data not merely as rich result eligibility criteria but as the foundation for content knowledge graphs.

This approach establishes relationships between entities on your website and links them to external authoritative knowledge bases, including Wikidata, Wikipedia, and Google's Knowledge Graph. When AI systems encounter your content, the connected schema provides comprehensive context about relationships between your products, services, team members, and broader industry concepts.

Status Labs' five-pillar approach to AI reputation management places schema implementation within this comprehensive framework. The methodology optimizes corporate websites as primary authoritative sources while establishing authoritative third-party references and managing review ecosystems with properly structured data.

Platform-Specific Considerations

Different AI platforms process schema markup according to their unique architectures and data sources. Understanding these variations enables targeted optimization.

Google's AI Overviews and Gemini prioritize websites with a comprehensive schema that contributes to Google's Knowledge Graph. Recent data shows that 80% of AI Overview citations come from top-3 organic results, but among those results, pages with well-implemented schema receive preferential selection.

ChatGPT with SearchGPT combines real-time web search with language model capabilities. While ChatGPT doesn't require schema to understand content, research suggests it retrieves information more thoroughly and accurately from pages with structured data. Schema reduces hallucinations by providing factual anchors that ground AI responses.

Perplexity AI explicitly values structured data's role in identifying reliable sources. Pages with robust schema markup appear more frequently in Perplexity's cited sources because the platform prioritizes well-defined, machine-readable information.

Common Implementation Errors

Several schema implementation mistakes can undermine or damage AI reputation rather than enhance it.

Mismatched Data represents the most damaging error. Discrepancies between visible page content and schema markup cause AI systems to question credibility. If your website displays a 4.8-star rating but schema markup shows a different figure, AI platforms may penalize or exclude your pages.

Incomplete Entity Definitions miss opportunities for AI recognition. Implementing Organization schema without comprehensive properties like founding date, leadership, and external profile links reduces AI confidence in your entity definition.

Static Schema on Dynamic Content creates accuracy problems over time. Businesses with changing inventory or pricing need systems that automatically update schema when underlying data changes.

Schema Manipulation backfires as AI detection improves. Adding irrelevant keywords or inaccurate information to structured data triggers penalties that compound over time.

The Strategic Imperative

Schema markup's value compounds as AI systems incorporate structured data into their understanding of the digital landscape. Organizations implementing comprehensive schema today establish authoritative representations that become increasingly difficult for competitors to displace.

This dynamic mirrors earlier digital transformations. Early adopters of mobile optimization gained advantages that persisted for years. With AI platforms already controlling significant information discovery, the window for establishing schema-based authority continues to narrow.

Status Labs' analysis shows that businesses with comprehensive schema markup maintain visibility across current and emerging AI search technologies, while competitors without structured data face accelerating invisibility. As the firm notes, schema markup has evolved from an optional technical enhancement to a foundational requirement for any organization serious about managing how AI systems understand, evaluate, and represent their brand.

Beyond Visibility: Schema as Reputation Protection

Schema markup functions as insurance against reputation damage that occurs when AI systems misunderstand, misidentify, or misrepresent your organization. By explicitly defining your entity with verifiable attributes and establishing connections to authoritative external sources, you reduce the probability of harmful misattribution.

This protective function becomes critical as AI systems increasingly mediate first impressions. When stakeholders query AI platforms about your company, the generated response shapes perceptions before any human visits your website. Accurate, comprehensive schema markup ensures these AI-generated first impressions align with reality.

The businesses and individuals investing in sophisticated schema strategies position themselves for an information environment where reputation depends on machine readability. For those seeking to understand how to implement these strategies effectively, Status Labs' comprehensive guide on schema markup's role in AI reputation provides detailed implementation frameworks and case studies demonstrating measurable impact.

As AI continues reshaping how information is discovered and presented, the organizations that control their structured data narrative will maintain the ability to shape their own story in an increasingly AI-mediated world.

* https://ift.tt/cWlArmB

** https://ift.tt/XGcRgp8

*** https://ift.tt/HVtDyjF

**** https://ift.tt/XHpKScD

by Sponsored Content via Digital Information World

Tuesday, December 23, 2025

How ChatGPT could change the face of advertising, without you even knowing about it

Nessa Keddo, King's College London
Image: DIW-Aigen

Online adverts are sometimes so personal that they feel eerie. Even as a researcher in this area, I’m slightly startled when I get a message asking if my son still needs school shirts a few hours after browsing for clothes for my children.

Personal messaging is part of a strategy used by advertisers to build a more intense relationship with consumers. It often consists of pop-up adverts or follow-up emails reminding us of all the products we have looked at but not yet purchased.

This is a result of AI’s rapidly developing ability to automate the advertising content we are presented with. And that technology is only going to get more sophisticated.

OpenAI, for example, has hinted that advertising may soon be part of the company’s ChatGPT service (which now has 800 million weekly users). And this could really turbocharge the personal relationship with customers that big brands are desperate for.

ChatGPT already uses some advanced personalisation, making search recommendations based on a user’s search history, chats and other connected apps such as a calendar. So if you have a trip to Barcelona marked in your diary, it will provide you – unprompted – with recommendations of where to eat and what to do when you get there.

In October 2025, the company introduced ChatGPT Atlas, a search browser which can automate purchases. For instance, while you search for beach kit for your trip to Barcelona, it may ask: “Would you like me to create a pre-trip beach essentials list?” and then provide links to products for you to buy.

“Agent mode” takes this a step further. If a browser is open on the page of a swimsuit, a chat box will appear where you can ask specific questions. With the browser history saved, you can log back in and ask: “Can you find that swimsuit I was looking at last week and add it to the basket in a size 14?”

Another new feature (only in the US at the moment), “instant checkout”, is a partnership with Shopify and Etsy which allows users to browse and immediately purchase products without leaving the platform. Retailers pay a small fee on sales, which is how OpenAI monetises this service.

However, only around 2% of all ChatGPT searches are shopping-related, so other means of making money are necessary – which is where full-on incorporated advertising may come in.

One app, lots of ads?

OpenAI’s rapid growth requires heavy investment, and its chief financial officer, Sarah Friar, has said the company is “weighing up an ads model”, as well as recruiting advertising specialists from rivals Meta and Google.

But this will take some time to get right. Some ChatGPT users have already been critical of a shopping feature which they said made them feel like they were being sold to. Clearly a re-design is being considered, as the feature was temporarily removed in December 2025.

So there will continue to be experimentation into how AI can be part of what marketers call the “consumer journey” – the process customers go through before they end up buying something.

Some consumers prefer to use customer reviews and their own research or experience. Others appreciate AI recommendations, but studies suggest that overall, some sense of autonomy is essential for people to truly consider themselves happy customers. It has also been shown that audiences dislike aggressive “retargeting”, where they are continuously bombarded with the same adverts.

So the option of ChatGPT automatically providing product recommendations, summaries and even purchasing items on our behalf might seem very tempting to big brands. But most consumers will still prefer a sense of agency when it comes to spending their money.

This may be why advertisers will work on new ways to blur the lines – where internet search results are blended with undeclared brand messaging and product recommendations. This has long been the case on Chinese platforms such as WeChat, which includes e-commerce, gaming, messaging, calling and social networking – but with advertising at its core.

In fact, platforms in the west seem far behind their East Asian counterparts, where users can do most of their day-to-day tasks using just one app. In the future, a similarly centralised approach may be inevitable elsewhere – as will subliminal advertising, with the huge potential for data collection that a single multi-functional app can provide.

Ultimately, transparency will be minimal and advertising will be more difficult to recognise, which could be hard on vulnerable users – and not the kind of ethically responsible AI that many are hoping for.The Conversation

Nessa Keddo, Senior Lecturer in Media, Diversity and Technology, King's College London

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read next: Shrinking AI memory boosts accuracy


by External Contributor via Digital Information World

Shrinking AI memory boosts accuracy

Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or help save significant amounts of energy.

Shrinking AI memory boosts accuracy
Image: Luke Jones / Unsplash

Experts from University of Edinburgh and NVIDIA found that large language models (LLMs) using memory eight times smaller than an uncompressed LLM scored better on maths, science and coding tests while spending the same amount of time reasoning.

The method can be used in an alternative way to help LLMs respond to more user queries simultaneously, reducing the amount of power needed per task.

As well as energy savings, experts say the improvements could benefit AI systems that are used to solve complicated tasks or in devices that have slow or limited memory, such as smart home devices and wearable technology.

Problem solving

By “thinking” about more complex hypotheses or exploring more hypotheses concurrently, AI models improve their problem-solving abilities. In practice, this is achieved by generating more reasoning threads – a step-by-step logical process used to solve problems – in text form.

The model’s memory – called the KV cache – which stores the portions of the threads generated, can act as a bottleneck, as its size slows down the generation of reasoning thread outputs during inference – the process by which AI models respond to an input prompt, such as answering a user query.

The more threads there are, and the longer they are, the more memory is required. The larger the memory size used, the longer the LLM takes to retrieve the KV cache from the part of the AI device where it is stored.

Memory compression

To overcome this, the team developed a method to compress the models’ memory – called Dynamic Memory Sparsification (DMS). Instead of keeping every token – the units of data that an AI model processes – DMS decides which ones are important enough to keep and which ones can be deleted.

There is a slight delay between the time when the decisions to delete tokens using sparsification are made and when they are removed. This gives the model a chance to pass on any valuable information from the evicted tokens to preserved ones.

In managing which tokens to keep and which to discard, DMS lets the AI model "think” in more depth or explore more possible solutions without needing extra computer power.

Models tested

The researchers tested DMS on different versions of the AI models Llama and Qwen and compared their performance to models without compression.

The models’ performance was assessed using standardised tests. It was found even with memories compressed to one eighth their original size, LLMs fully retain their original accuracy in difficult tasks while accelerating reasoning compared with non-compressed models.

In the standardised maths test AIME 24, which served as the qualifier for the United States Mathematical Olympiad, the compressed models performed twelve points better on average using the same number of KV cache reads to produce an answer.

For GPQA Diamond – a series of complex questions in biology, chemistry and physics authored by PhD-level experts – the models performed over eight points better.

The models were also tested with LiveCode Bench, which measures how well AI models can write code. The compressed models scored on average ten points better than non-compressed models.

In a nutshell, our models can reason faster but with the same quality. Hence, for an equivalent time budget for reasoning, they can explore more and longer reasoning threads. This improves their ability to solve complex problems in maths, science, and coding.

Dr Edoardo Ponti - GAIL Fellow and Lecturer in Natural Language Processing at the University’s School of Informatics

The findings from this work were peer reviewed and were presented at the prestigious AI conference NeurIPS.

Dr Ponti and his team will continue to investigate ways how large AI systems represent and remember information, making them far more efficient and sustainable as part of a 1.5 million euros European Research Council-funded project called AToM-FM.

This article has been republished on DIW with permission from The University of Edinburgh.

Read next:

• Subnational income inequality revealed: Regional successes may hold key to addressing widening gap globally

• Why many Americans avoid negotiating, even when it costs them


by External Contributor via Digital Information World