Saturday, November 1, 2025

AI’s Limits Exposed: New Study Finds Machines Struggle With Real Remote Work

For years, discussion around artificial intelligence has centered on whether machines could eventually replace human jobs. That question has become sharper with the growth of remote work, where tasks are handled entirely online and often require a mix of technical and creative ability. Yet a new study from the Center for AI Safety and Scale AI provides a clearer picture of what AI can actually do in those settings. The findings show that, despite steady progress in reasoning and automation tools, today’s AI systems can complete only a small fraction of real freelance projects at human quality levels.

The study, called the Remote Labor Index (RLI), represents one of the most detailed attempts so far to measure AI’s performance on practical digital work. It focuses on tasks that mirror real online freelancing jobs rather than theoretical tests or benchmark problems. Researchers collected 240 completed projects from professional freelancers working through platforms such as Upwork. Each project included the original brief, all input materials, and the final deliverable that a client had accepted. These projects came from 23 categories of work, including product design, animation, architecture, game development, and data analysis. Together they covered more than 6,000 hours of paid labor valued at about $140,000.

Six advanced AI agents were then tested on the same projects. The systems included Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT agent, and Gemini 2.5 Pro. Human evaluators compared the AI results to the professional standards of the original deliverables. The measure used was called the automation rate, defined as the percentage of projects that an AI completed to a standard that would be acceptable to a reasonable client.

The overall results placed current AI performance close to the bottom of the scale. Manus achieved the best outcome, with a 2.5 percent automation rate. Grok 4 and Sonnet 4.5 followed at 2.1 percent, while GPT-5 and ChatGPT agent reached 1.7 and 1.3 percent. Gemini 2.5 Pro finished last at 0.8 percent. In effect, even the strongest model could only complete two or three projects successfully out of every hundred. These numbers confirm that most paid remote work remains well beyond the reach of today’s AI systems.

To understand why, the study reviewed where and how the models failed. Nearly half of the AI outputs were judged to be of poor quality. About 36 percent were incomplete, and 18 percent contained technical errors such as corrupted or unusable files. Many tasks broke down before completion, with missing visuals, truncated videos, or unfinished code. Others showed inconsistency between design elements, such as an object changing shape between different 3D views. These errors highlight that even powerful models lack the internal verification ability that human workers apply when checking and refining their own results.

The researchers also noted that remote projects typically combine several layers of skill. A single job might involve writing, coding, design choices, and client-level presentation. While current AI models can produce functional text, basic graphics, or snippets of code, they often fail to align all these elements into a coherent, professional output. The lack of integrated quality control leads to results that are close to correct in parts but unsatisfactory as complete deliverables.

Some narrow areas showed stronger AI performance. Tasks involving short audio clips, simple image generation, or data visualization were occasionally completed at human level. In those cases, the systems benefited from established generative tools that already handle single-format media. The study used an additional metric, known as an Elo score, to track relative progress between different models. Although none matched the human baseline, newer models did show measurable improvement compared with earlier versions, suggesting steady if limited advancement.

Economically, the gap between potential and reality remains wide. When translated into market value, the highest-earning model, Manus, produced accepted work worth only $1,720 out of a total pool of nearly $144,000. This indicates that the contribution of current AI tools to freelance productivity is still marginal. The data also show that AI has not yet achieved meaningful cost deflation in remote labor markets, since most tasks still require full human oversight or redo.

For professionals who depend on online freelance income, the study’s conclusions provide some reassurance. Remote workers, especially in design, architecture, and multimedia fields, remain largely irreplaceable at present. The same applies to roles that involve judgment, error correction, and visual or interactive quality checks. However, the results also point to a gradual path of improvement. As AI models gain better multimodal reasoning and tool-use capacity, they may begin to handle larger portions of complex tasks under supervision.

The authors acknowledge that the benchmark does not cover all types of remote jobs. Work involving direct client communication, teamwork, or long-term project management was excluded. Even so, the Remote Labor Index represents the broadest test so far of AI’s real automation capacity in economically meaningful work. Its value lies in offering empirical measurement rather than assumption. By grounding AI evaluation in actual freelance projects, it shifts the conversation from hypothetical capabilities to demonstrated performance.

The findings suggest that the path to full automation of digital labor remains long. While AI can now assist with smaller creative or technical steps, it still struggles with the coordination, judgment, and quality assurance that professional work requires. Future updates to the RLI may help track whether ongoing model improvements translate into practical economic performance. For now, the study confirms that artificial intelligence, though advancing quickly, has yet to match the reliability and completeness of human remote workers.


Image: Yasmina H / Unsplash

Notes: This post was edited/created using GenAI tools.

Read next: Carnegie Mellon Study Finds Advanced AI Becomes More Self-Interested, Undermining Teamwork as It Gets Smarter
by Irfan Ahmad via Digital Information World

How Social Media Loyalty Loops Make Us Copy the Worst Behavior from Our Own Group

Social media has long been blamed for amplifying anger and hostility, but a new line of research suggests that the real source of toxicity may not be our opponents at all. It often begins with the people we already agree with.

A recent study from the University of Haifa found that users who see rude or intolerant posts from their own political side are far more likely to mirror that behavior than when they encounter hate from the opposing side.

Toxicity That Feels Familiar

The researchers analyzed millions of posts shared on X (formerly Twitter) in Israel during 2023, a year marked by deep political division. They wanted to understand how toxic behavior spreads and whether it moves differently inside or across political groups. Their focus wasn’t only on how users express hostility, but also on the kind of toxicity they adopt. They separated two dimensions of harmful speech: impolite style, which covers rude tone or foul language, and intolerant substance, which involves messages that demean social or political groups.

Across more than seven million tweets, one clear pattern emerged. People were significantly more likely to post toxic messages after seeing such behavior from their own side. This “ingroup contagion” proved stronger than any reaction to insults from opponents. When users saw hostility from the other side, they often responded defensively but not as intensely. The strongest predictor of new toxic posts was exposure to toxicity that came wrapped in familiarity.

The Pull of Belonging

The finding reflects a deeper social mechanism. People online do not only communicate as individuals; they perform as representatives of their group. When members of the same political community use harsh language, others interpret that tone as part of the group’s identity. To fit in, they copy it. It’s a form of social mirroring shaped by loyalty, not simply by outrage.

On platforms built around likes, replies, and visibility, such behavior brings quick social rewards. Users who echo the style of their peers can gain approval and attention. Over time, that cycle of validation turns hostility into habit. The researchers call this dynamic a “contagion,” not because people lose control, but because social media design makes imitation effortless.

Where Algorithms Meet Identity

What makes this process powerful is how platforms amplify identity signals. Algorithms that prioritize engagement naturally favor emotional and confrontational content. As posts from one’s own side fill the feed, the distinction between passionate support and open hostility blurs. Even moderate users may start matching the tone they see most often.

Interestingly, the study found that echo chambers (online spaces filled only with like-minded users) were not the worst environments for contagion. People surrounded by uniform opinions already hold firm views and feel less need to prove their loyalty. Toxicity spread faster in mixed networks, where users are exposed to both allies and opponents. The friction of that diversity appeared to intensify imitation within groups.

From Rudeness to Intolerance

The research also revealed a subtle but worrying shift. Exposure to mild forms of impoliteness, such as sarcasm or insults, sometimes led users to post not just rude comments but openly intolerant ones. In other words, small breaches of civility could snowball into expressions that reject or devalue other social groups. What begins as casual frustration can evolve into language that undermines democratic norms of respect and inclusion.

Breaking the Loop

While the study focused on political communication in Israel, the patterns it uncovered apply broadly. Across digital platforms, people tend to model their tone on the behavior of those they identify with. That human impulse is what keeps communities coherent - and what makes them vulnerable to turning sour.

Understanding this dynamic shifts responsibility away from blaming algorithms alone. Social media’s design does encourage hostility, but much of the toxicity that circulates online grows out of ordinary acts of imitation. Each time users echo the anger of their peers, they reinforce the idea that aggression is part of belonging.

The next time a heated post from a familiar account flashes across the screen, it may help to pause before responding in kind. What feels like standing with one’s side might simply be feeding the very cycle that keeps social media meaner than it needs to be.


Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next: Search That Knows You: Google’s AI Era Keeps Ads Alive in a More Personal Web
by Asim BN via Digital Information World

Search That Knows You: Google’s AI Era Keeps Ads Alive in a More Personal Web

Google is moving search into a new phase that blends artificial intelligence with deeper personal context, but it is not leaving advertising behind. Its experimental AI Mode in Search, which is gradually being tested through Google Labs, reflects how the company now wants search to feel more like a conversation that understands a user’s intent, habits, and even data spread across other Google services. Rather than typing short phrases, people can describe what they want in fuller language, attach images, or speak naturally, and the system reasons through multiple layers of information before delivering a response that ties together web results, live local data, and visual references.

The company’s vice president for Search, Robby Stein, explained in a recent conversation that the goal is to help users reach decisions more quickly by connecting what Google already knows about the world with what it can learn about each individual. Early experiments let people opt in to more tailored experiences, such as personalized restaurant suggestions or shopping advice, based on their previous interactions. Over time, this could expand to include Gmail, Drive, and Calendar, allowing the system to pull relevant details like flight schedules or meeting times to produce answers that match personal circumstances. Google insists these options will remain voluntary, but the move clearly signals a step toward a search engine that functions as a companion rather than a static information tool.

While the shift toward personalization sounds like a departure from traditional search, Google’s advertising model remains firmly in place. Instead of removing ads, the company is exploring how they can evolve inside conversational or multimodal results. The same reasoning systems that interpret complex questions might soon recognize when someone is planning a home renovation or comparing prices and present sponsored suggestions that fit naturally within those contexts. These won’t appear as the familiar boxes of text links but as integrated recommendations that match the flow of an AI-driven conversation. Google calls this an experiment in new ad formats, a cautious attempt to see how commercial information can live within generative interfaces without overwhelming the user experience.

For businesses, the change redefines how visibility works. In Stein’s words, AI often “thinks like a person,” meaning it values reliability and clear context over raw keyword density. The companies that appear in AI recommendations will likely be those already cited in trustworthy sources or reviewed positively across Google’s ecosystem. Traditional signals such as content quality, site clarity, and consistent information remain crucial, but now they feed directly into how AI models interpret authority. As a result, public visibility may depend as much on how accessible information is to algorithms as on how appealing it is to human readers.

What emerges from all this is a vision of search that feels more fluid and intuitive but also more intertwined with personal data and commercial logic. Google’s bet is that people will accept deeper personalization in exchange for convenience and precision. If AI Mode delivers on that promise, the boundary between browsing, asking, and buying could fade into a single experience, one that understands not only what users want but who they are.


Read next: New Study Shows Which Countries Use VPN Most and Least
by Irfan Ahmad via Digital Information World

Friday, October 31, 2025

New Study Shows Which Countries Use VPN Most and Least

The latest Cybernews study examines the download numbers of VPN applications and compares them to the populations of each of the 106 analyzed countries to determine the per capita usage and identify where VPNs are most popular.

Perhaps to little surprise, VPNs are most used in countries with strict internet restrictions, particularly in Arab countries like the United Arab Emirates, Qatar, Oman, Saudi Arabia, and Kuwait.

In these countries, the government has restricted access to VoIP services (Voice over Internet Protocol) like WhatsApp, FaceTime, Skype, etc.

Moreover, because it does not align with Islamic values, the majority of adult and gambling websites are banned as well.

Some journalists also turn to VPNs to express their opinions freely without the fear of repercussions from the government.

That’s why 5 out of the top 10 countries with the highest vpn adoption are Arab nations:

Top 10 countries for VPN adoption rates

  1. UAE – 65.78% average adoption rate (2020 - 2025)
  2. Qatar – 55.43% average adoption rate (2020 - 2025)
  3. Singapore – 38.23% average adoption rate (2020 - 2025)
  4. Nauru – 35.49% average adoption rate (2020 - 2025)
  5. Oman – 31.04% average adoption rate (2020 - 2025)
  6. Saudi Arabia – 28.93% average adoption rate (2020 - 2025)
  7. The Netherlands – 21.77% average adoption rate (2020 - 2025)
  8. The United Kingdom (UK) – 19.63% average adoption rate (2020 - 2025)
  9. Kuwait – 17.88% average adoption rate (2020 - 2025)
  10. Luxembourg – 17.30% average adoption rate (2020 - 2025)
In contrast, the 10 countries with the lowest VPN adoption are predominantly African countries except for China and Vanuatu.

The 10 countries with the lowest VPN adoption are: China, Malawi, Angola, Zambia, Cameroon, Rwanda, Zimbabwe, Kenya, Vanuatu, and Tanzania.

Although China is a nuanced case, and these findings should be taken with a grain of salt, since the methodology of the Cybernews study calculates adoption rates based on the 50 largest provider download numbers from the Google Play Store and Apple App Store.

However, Google Play Store is banned in China, and Apple’s App Store is very restricted, with most VPN apps removed.

In turn, most Chinese citizens seek various workarounds to install VPNs, such as side-loading applications downloaded from official or even untrustworthy websites, or using lesser-known apps that are still not blocked by the Chinese government.

So, even though China ranks as the lowest VPN adopter in the study, this is likely not the case due to the limitations of the methodology.

VPN downloads and adoption statistics of the last 5 years

Across 106 countries, the average VPN adoption rate rose from 6.95% in 2020 to 10.58% in 2024. That’s roughly an 11.4% compound annual growth rate.

Global downloads by year:

  • 2020: 284,591,457
  • 2021: 295,722,780
  • 2022: 487,049,573
  • 2023: 404,248,986
  • 2024: 464,021,602
  • 2025 (H1): 282,101,253

The total download numbers globally show that downloads have increased overall year by year, except for a slight dip in 2023, as it was challenging for growth to keep pace after the 2022 COVID-19 numbers.

Additionally, the first half of 2025 downloads already nearly match those of the entire year 2020, and 2025 is on track to meet or exceed the demand of 2024.

Global adoption rates by year

  • 2020: 6.95%
  • 2021: 6.84%
  • 2022: 10.06%
  • 2023: 10.90%
  • 2024: 12.35%
  • 2025 (H1): 6.89%

Similarly to download numbers, they are steadily growing overall year by year, despite a slight decline in 2021.

A closer look at G7 countries

United States (global rank 21)

Adoption peaked at 19.75% in 2022. It was 18.36% in 2024 and 10.56% in H1 2025.
Downloads are the highest in the world: 63.4 million in 2024 and 36.7 million in H1 2025.

United Kingdom (global rank 8)

Adoption climbed from a low of 15.80% in 2021 to 24.08% in 2024. H1 2025 sits at 15.38%.
Downloads reached 16.6 million in 2024 and 10.7 million in H1 2025.

Germany (global rank 17)

Adoption rose from 6.94% in 2020 to 21.36% in 2024. H1 2025 is 10.77%.
Downloads hit 18.1 million in 2024 and 9.1 million in H1 2025.

Canada (global rank 18)

Adoption peaked at 17.18% in 2024. H1 2025 is 10.76%.
Downloads totaled 6.8 million in 2024 and 4.3 million in H1 2025.

France (global rank 22)

Adoption reached 16.64% in 2024. H1 2025 stands at 10.55%.
Downloads were 11.1 million in 2024 and 7.0 million in H1 2025.

Italy (global rank 73)

Adoption peaked at 7.48% in 2023 and eased to 7.04% in 2024. H1 2025 is 3.91%.
Downloads hit 4.2 million in 2024 and 2.3 million in H1 2025.

Japan (global rank 84)

Adoption remains low. It was 4.63% in 2020, 4.32% in 2024, and 2.60% in H1 2025.
Downloads reached 5.3 million in 2024 and 3.2 million in H1 2025.

Why the Middle East leads

Governments in the Gulf filter content and restrict categories like adult sites, gambling, and some political material.

Many also limit VoIP calling on WhatsApp, Skype, and FaceTime, which makes everyday communication more challenging without a VPN, especially considering that there are many people in the UAE who have their families waiting for weekly calls from abroad.

One more point worth mentioning is that personal VPN use sits in a legal gray area. For the most part, you won’t get in trouble for using a VPN; however, if you use a VPN while committing a crime, then the penalties are increased significantly.

What drives adoption in Europe and Singapore

In Singapore, the Netherlands, the UK, and Luxembourg, the demand for VPNs shifts more towards overcoming content restrictions, such as accessing the widest Netflix libraries.

Security is a consistent concern across all countries, of course.
And privacy is a considerable reason even in countries where the government doesn’t necessarily monitor their citizens as strictly as Arab nations. Yet, many people in free democracies still don’t like the idea that their internet service provider (ISP) can see everything they do online.

The Ukraine-Russia war drove up VPN adoption significantly

War shock moved up the numbers fast. Ukraine’s adoption rate was 6.14% in 2021. It jumped to 18.92% in 2022 and has remained above 10% since then. Russia's adoption rate increased from 4.28% in 2021 to 42.20% in 2022.

A note on the US

The US shows why adoption percentage and raw downloads tell different stories.

The US is outside the global top 20 by per‑capita adoption, yet the total download volume of VPN apps in the US is the largest globally.

People use VPNs to limit ISP tracking, stay secure on public Wi‑Fi, avoid DDoS attacks, and keep access to streaming catalogs while traveling.

While hard content blocks are rare in the US, the internet is generally unrestricted, but privacy, security, and convenience keep demand growing.

Methodology limits worth considering

Cybernews compared app store downloads to population to estimate adoption. It’s a clean way to compare countries of different sizes, but there are limits.

  • Downloads don’t equal unique users. Reinstalls and device changes inflate totals.
  • Store region doesn’t always match where someone lives. That matters in restricted markets.
  • Desktop apps, direct website downloads, and side‑loaded Android APKs aren’t counted.
  • Adoption per capita doesn’t adjust for internet access or smartphone ownership.
  • The dataset covers 50 VPN providers. Niche or regional apps might be missing.
  • The analysis spots patterns, but it doesn’t prove cause.


Read next:

• Android’s AI Shields Outperform iPhone as New Studies Highlight Scam Protection Gap
by Irfan Ahmad via Digital Information World

Carnegie Mellon Study Finds Advanced AI Becomes More Self-Interested, Undermining Teamwork as It Gets Smarter

New research from Carnegie Mellon University suggests that as artificial intelligence develops stronger reasoning skills, it may also become less inclined to cooperate.

The study, conducted by researchers in the School of Computer Science, found that advanced language models capable of deep reasoning tend to favor individual gain over collective benefit, raising concerns about how such systems may behave in social or collaborative environments.

The team examined whether artificial intelligence can balance logic with social intelligence, the ability to make decisions that consider the good of a group. Using a series of economic games traditionally used in behavioral science, they measured how various large language models acted when faced with social dilemmas. The findings revealed a clear pattern: models designed for deliberate reasoning showed consistent declines in cooperative behavior, even when cooperation led to better outcomes for all participants.

The experiments included both reasoning and non-reasoning versions of several popular AI systems, including models from OpenAI, Google, Anthropic, DeepSeek, and Qwen. Each model was assigned tasks in simulated decision games such as the Public Goods, Prisoner’s Dilemma, and Dictator games, which tested their willingness to share resources or punish selfish behavior.

In one experiment, OpenAI’s non-reasoning model GPT-4o chose to share resources nearly all the time, while its reasoning counterpart, o1, did so in only one-fifth of trials. Similar trends appeared across other AI families. When reasoning capabilities were added (using techniques like step-by-step logic or reflective prompting) cooperation consistently dropped. In several cases, the decline exceeded fifty percent.

Beyond individual actions, the researchers also tested how groups of AIs interacted when reasoning and non-reasoning models were mixed together. Here, the results grew even more striking. Groups with more reasoning models earned less overall, as self-interested behavior from the reasoning systems reduced total cooperation. The tendency for these agents to prioritize their own outcomes spread to others, eroding collective performance.

Across ten different models, those equipped with extended reasoning consistently displayed weaker willingness to share, help, or enforce social norms. Although reasoning helped them analyze problems in a structured way, it often came at the cost of empathy-like decision-making. Their logic-driven choices mirrored what the study describes as “spontaneous giving and calculated greed,” a pattern observed in human psychology when deliberate thought overrides intuitive cooperation.

The researchers argue that this emerging behavior points to a gap between cognitive and social intelligence in artificial systems. Current models excel at solving structured problems, but when placed in situations that require trust, reciprocity, or collective coordination, the same logical reasoning that strengthens performance in tests appears to weaken social cohesion.

These results hold implications for how people use AI in real-world decision-making. As reasoning systems are increasingly used to assist in classrooms, businesses, or even policy settings, their tendency to optimize for individual advantage could distort group outcomes. A model that appears rational may encourage users to act in ways that seem efficient but ultimately reduce cooperation and fairness within teams or organizations.


The study also cautions against equating intelligence with social wisdom. The researchers note that while reflective and logical processing improves task performance, it does not necessarily foster prosocial behavior. Without mechanisms that integrate empathy, fairness, or shared benefit into reasoning, AI systems risk amplifying human tendencies toward competition rather than collaboration.

In repeated trials, groups composed mainly of reasoning models earned only a fraction of the total points achieved by groups of non-reasoning ones, despite each agent acting logically within its own frame of reference. This imbalance illustrates how rational individual strategies can collectively produce poorer results... a dynamic familiar in economic theory but now evident in artificial systems as well.

The authors suggest that future AI development should focus on embedding social intelligence alongside reasoning. Rather than simply optimizing for accuracy or speed, models need the ability to interpret cooperation as a rational choice when it benefits collective welfare. In human societies, trust and mutual consideration sustain long-term progress. Extending those same principles to intelligent machines, they argue, will be essential if AI is to contribute meaningfully to shared human goals.

Carnegie Mellon’s study adds to growing evidence that smarter artificial intelligence does not automatically make for better social partners. As reasoning power increases, designers may need to balance logic with compassion to prevent future systems from becoming highly capable yet socially shortsighted.


Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next: Apple’s Sales Edge Higher as iPhone Demand Stabilizes and Services Lead Growth
by Irfan Ahmad via Digital Information World

WhatsApp Rolls Out Passkey Backups While Building Bridges to Other Messaging Apps

WhatsApp has started adding a new passwordless security option for chat backups while also preparing tools that will connect users across different messaging platforms in Europe. The two updates show how the company is refining both privacy protection and regulatory compliance as its messaging ecosystem becomes more open.

The new passkey backup feature lets users secure their stored messages using their fingerprint, face scan, or device unlock code rather than a long password or encryption key. It is being introduced gradually on iOS and Android, giving users a simpler and safer way to protect archived conversations on iCloud or Google Drive.


Passkeys replace traditional passwords with a system that relies on cryptographic keys unique to each device. When a user enables the feature, the phone generates a private key that never leaves the device and a public key shared with the app’s servers. This separation prevents anyone from stealing the authentication data during a breach, since there’s nothing stored online that can be copied or reused elsewhere. The result is an encrypted backup that can be unlocked instantly through the same biometric system already used to open the app or authenticate payments.

The idea behind this change is not just convenience but consistency. Until now, WhatsApp’s end-to-end encryption covered chats and calls, but backups still required a manually set password or a lengthy recovery key. By integrating passkeys, Meta is extending the same protection standards across the entire messaging cycle, ensuring that stored data remains private without asking users to memorize complex codes.

While the backup upgrade is rolling out globally, the company is simultaneously testing a feature in Europe that could reshape how people communicate across different chat platforms. Under development in recent Android betas, WhatsApp is building interoperability tools that allow users to send and receive messages with people using other messaging apps. The project stems from the European Union’s Digital Markets Act, which requires major platforms to make their core services compatible with competing ones.


Once enabled, the interoperability feature will let WhatsApp users exchange messages, photos, videos, and voice notes with contacts from supported external apps. Users will be able to manage this experience through privacy controls that determine who can add them to third-party chats or group conversations. These settings will include options that limit invitations to known contacts or selected services, giving users precise control over visibility and unwanted requests.

Security remains central to this expansion. WhatsApp will require external messaging providers to demonstrate equivalent encryption standards before connecting their systems. The platform encourages partners to adopt the Signal Protocol, already used for WhatsApp’s internal encryption, though other compatible systems may be approved after technical verification. This ensures that cross-platform communication maintains the same level of privacy expected inside WhatsApp’s own network.

Group chats are also being adapted for this environment. Each participant in a cross-app group will need to enable interoperability, allowing messages and media to move securely between services. Although some native features like stickers or disappearing messages won’t initially carry over, WhatsApp plans to refine these functions after the basic structure is stable.

By pairing passwordless backups with the coming interoperability framework, WhatsApp is reinforcing its dual priorities: stronger personal security and regulatory compliance. Together, they mark a shift from isolated platforms toward a more connected but still encrypted messaging world — one where privacy and openness can coexist within the same ecosystem.

Read next:

• Study Maps the Divide Between AI-Generated Results and Traditional Search Lists

• AI Tools May Improve Reasoning but Distort Self-Perception
by Irfan Ahmad via Digital Information World

Thursday, October 30, 2025

Study Maps the Divide Between AI-Generated Results and Traditional Search Lists

The familiar rhythm of typing a query and scanning a page of ranked links is giving way to something new. Search engines now build answers instead of lists. Generative systems summarize information, cite sources in passing, and present a single text block that feels complete. But how does this shift change what people actually find?

A team from Ruhr University Bochum and the Max Planck Institute for Software Systems set out to measure that difference. Their study compared Google’s traditional search with four AI-driven counterparts... Google AI Overview, Gemini, GPT-4o-Search, and GPT-4o with its built-in search tool. Thousands of questions spanning science, politics, products, and general knowledge were tested across these systems to map how each retrieves, filters, and recombines web information.


The researchers found that AI search engines gather from a wider pool of sources but rarely from the most visited or highly ranked sites. Google’s organic results still lean on established, top-ranked domains, while AI models often pull content from lower-ranked or niche websites. Yet this diversity of origin doesn’t guarantee a richer spread of ideas. When the team analyzed conceptual coverage (how many distinct themes each system produced) AI and traditional search returned similar breadth overall.

Different engines showed clear behavioral patterns. GPT-4o with its search tool relied heavily on internal memory, drawing from fewer external pages. Google AI Overview and Gemini, in contrast, favored fresh, external material and cited far more links. GPT-4o-Search sat between these extremes, retrieving a moderate number of pages but generating longer, more structured responses. Organic search, fixed at ten results per query, remained the most stable reference point.

Over time, those differences deepened. When the researchers repeated their tests two months later, AI outputs had shifted markedly, reflecting how generative systems adapt (or drift) as the web and models evolve. Google’s standard search results changed little. Gemini and GPT-4o-Search adjusted sources and phrasing but kept comparable topic coverage. Google’s AI Overview showed the greatest fluctuation, sometimes rewriting entire responses with new references.

The findings underline how reliance on internal model knowledge affects accuracy and freshness. Engines that search the live web adapt faster to new events, but those that depend mainly on stored understanding struggle with recent developments. In tests on trending queries, retrieval-based systems such as Gemini and GPT-4o-Search performed best, while models like GPT-4o-Tool often missed updates or produced outdated answers.

Beyond the technical contrasts lies a broader issue: how information is framed. Traditional search exposes multiple viewpoints through discrete links, leaving users to weigh relevance and trust. Generative engines compress those perspectives into one narrative, which can subtly alter emphasis and omit ambiguity. The shift streamlines access but narrows visibility.

For researchers, that change demands new metrics. Existing evaluations built for ranked lists — precision, recall, or diversity scoring — cannot capture how synthesized responses balance factual grounding, conciseness, and conceptual range. The study’s authors call for benchmarks that measure not just what AI retrieves, but how it fuses and filters meaning.

Generative search does not yet replace the web’s familiar architecture of exploration. Instead, it reshapes it... trading transparency for convenience, consistency for adaptability. As search engines become storytellers rather than librarians, understanding what shapes their answers becomes as crucial as the answers themselves.

Notes: This post was edited/created using GenAI tools.

Read next: AI Tools May Improve Reasoning but Distort Self-Perception


by Irfan Ahmad via Digital Information World