Mr Branding

Saturday, June 14, 2025

Apple’s AI Critique Faces Pushback Over Flawed Testing Methods

A recent research paper from Apple raised eyebrows in the AI community after suggesting that today’s most advanced language models fail dramatically when faced with complex reasoning tasks. But that conclusion is now being challenged, not because the tasks were too difficult, but because, critics argue, the experiments weren’t fairly designed to begin with.

Alex Lawsen, a researcher at Open Philanthropy, has responded with a counter-study questioning the foundations of Apple’s claims. His assessment, published this week, argues that the models under scrutiny (including Claude, Gemini, and OpenAI’s latest systems) weren’t breaking down due to cognitive limits. Instead, he says they were tripped up by evaluation methods that didn’t account for key technical constraints.

One of the main flashpoints in the debate is the Tower of Hanoi, a well-known puzzle often used to test logical reasoning. Apple’s paper reported that models consistently failed when the puzzle became more complex - typically at eight disks or more. But Lawsen points out a critical issue that the models weren’t failing to solve the puzzle. They were often simply stopping short of writing out the full answer because they were nearing their maximum token limit - a built-in cap on how much text they can output in one go.

In several cases, the models even stated they were cutting themselves off to conserve output space. Rather than interpreting this as a practical limitation, Apple’s evaluation counted it as a failure to reason.

A second issue arose in the so-called River Crossing test, where models are asked to solve a version of the Missionaries and Cannibals puzzle. Apple included setups that were mathematically unsolvable, for example, asking the model to ferry six or more agents using a boat that could only carry three at a time. When the models recognized that the task couldn’t be completed under the given rules and refused to attempt it, they were still marked wrong.

A third problem involved how Apple’s system judged the responses. It relied on automatic scripts to evaluate output strictly against full, exhaustive solutions. If a model produced a correct but partial answer (or took a strategic shortcut) it still received a failing score. No credit was given for recognizing patterns, applying recursive logic, or even identifying the task’s limitations.

To illustrate how these issues can distort results, Lawsen ran a variation of the Hanoi test with a different prompt. Instead of asking the models to list every move, he instructed them to write a small program (in this case, a Lua function) that could solve the puzzle when executed. Freed from the burden of listing hundreds of steps, the models delivered accurate, scalable solutions, even with 15 disks - well beyond the point where Apple’s paper claimed they failed entirely.

The implications go beyond academic nitpicking. Apple’s conclusions have already been cited by others as evidence that large AI models lack the kind of reasoning needed for more ambitious tasks. But if Lawsen’s analysis holds up, it suggests the story is more complicated. The models may struggle with long-form answers under tight output limits, but their ability to think through a problem algorithmically remains intact.

Of course, none of this means large reasoning models are problem-free. Even Lawsen acknowledges that designing systems that can reliably generalize across unfamiliar problems remains a long-term challenge. His paper calls for more careful experimentation i.e., tests should check whether puzzles are actually solvable, track when models are being truncated due to token budgets, and consider solutions in multiple formats, from plain text to structured code.

The debate boils down to a deeper question, are we really measuring how well machines think, or just how well they can type within a fixed character limit?

Image: DIW-Aigen

Read next: ChatGPT Linked to Delusions, Self-Harm, and Escalating Mental Health Crises
by Irfan Ahmad via Digital Information World

Friday, June 13, 2025

Google Tests Spoken Summaries in Search Results, But You’ll Have to Ask First

Google is experimenting with a new way to deliver search results, one that talks back. A feature called Audio Overviews is now available to users in the US through Google’s Search Labs, offering short spoken summaries for some queries, powered by the company’s Gemini AI model.

Once enabled, the tool introduces an audio clip that sounds like a brief conversation between two computer-generated voices. The new AI-powered feature discusses the topic at hand, aiming to give listeners a broad overview without needing to scroll through multiple websites. It’s not on by default, users have to opt in, and for now, only certain topics trigger the option. But that could change. If past rollouts are anything to go by, this might soon become a default feature, with no option to turn it off.

When it appears, the player sits midway down the page, just below the “People also ask” section. Users are asked to generate the clip manually, and it may take several seconds before playback begins. The result is a back-and-forth between the AI voices, covering key points from the top-ranked search results.

Concerns arise as spoken summaries may reduce website traffic, continuing pattern of AI impacting traditional publishers’ visibility.

Playback controls are simple i.e.: pause, skip, volume, and variable speed settings are all included. Below the player, Google lists the websites that contributed to the summary. Users can also rate the experience with a thumbs up or down, giving feedback on the audio or the experiment as a whole.

The idea behind Audio Overviews, according to Google, is to help people get a quick sense of unfamiliar topics, especially in situations where reading isn’t convenient - such as when commuting or multitasking. A suggested prompt is “how do noise cancellation headphones work?”, though the feature is already appearing for a growing range of searches.

The same audio format has previously appeared in other Google products, including NotebookLM, the Gemini app, and even Google Docs. This latest rollout to Search reflects the company’s continued shift toward more “multimodal” experiences, blending text, audio, and interactivity in a single interface.

But while the feature works relatively well for straightforward topics, it isn’t flawless. AI-generated summaries have occasionally shown inconsistencies or factual gaps, particularly when drawing from a broader set of online sources. Unlike NotebookLM, where the AI works from a curated document set, the open nature of Search can lead to less reliable interpretations.

There’s also the question of what this means for the wider web. If users rely on spoken summaries for quick answers, fewer may click through to original sources, a trend already affecting publishers as AI tools become more prominent in search.

For now, Audio Overviews remain opt-in and experimental. But given Google’s recent history, that may not be the case for long. Like its earlier text-based AI summaries, which moved from limited trials to default search features within weeks, this voice-driven format may follow a similar path.

Read next:

• From OpenAI's o3 to Grok-3 Vision: These AI Models Took the Mensa Test, Results May Surprise You

• Remote Regions in the United States Still Struggle With High Costs and Poor Internet Access
by Irfan Ahmad via Digital Information World

Remote Regions in the United States Still Struggle With High Costs and Poor Internet Access

There’s paying a lot. And then there’s paying a lot for something that barely works when you need it.

In Wyoming, the average person gives up about an hour and 25 minutes of their monthly working time just to cover a standard home internet bill. For that, they get download speeds that don’t even reach 110 Mbps. On paper, that number might not seem terrible. But if you’ve tried joining a Zoom call while someone else in the house is streaming or uploading files, you’ll notice just how quickly that speed starts to fall apart.

In Remote US States Staying Connected Still Means Paying More for Less Every Month

Montana’s not far behind. Slightly faster speeds, but nearly the same hit to your paycheck. Alaska’s situation feels familiar, more remote geography, similar results. None of this is new, but when a research group from Spinblitz laid the numbers out side by side — local wages, cost of service, and actual internet speed — the pattern wasn’t subtle.

In the bottom 10 states for internet value, most are rural, many are lower-income, and all of them are handing over too much time or money (often both) for lackluster service. In Iowa, it’s over an hour and a half of wages for a connection that doesn’t quite break 165 Mbps. In South Dakota, you get more speed, close to 190 Mbps, but still lose 1.6 hours of labor just paying for it.

And then there’s Idaho. Less than an hour of work a month, which is better. But when the speed hovers around 140 Mbps, it’s not quite the bargain it looks like at first glance.

New Mexico’s somewhere in the middle. Speeds aren’t the worst, cost’s not the highest. But it's still sitting in the same awkward group, states where you’re overpaying, one way or another.

Some states, Maine, West Virginia, Arkansas, don’t suffer from the absolute slowest speeds, but the cost-to-speed ratio keeps them stuck near the bottom of the value list. The math shifts slightly state by state, but the equation rarely balances out.

There’s something especially frustrating about the fact that in many of these places, digital infrastructure has been promised, delayed, and debated for years. And while it’s true that stringing fiber across hundreds of miles of remote land costs more than wiring up cities, the end result is the same: folks in less-populated states are shelling out more time just to keep up with the rest.

The issue goes beyond monthly bills and download speeds. At its heart, it’s about whether people can realistically stay connected, to their jobs, their education, their healthcare, without falling behind for reasons that shouldn’t be this common. In many parts of the country, that’s still far from guaranteed.

A full list of states ranked by how much they pay for internet, from high prices and poor service to fair costs and fast speeds.

State	Median Download Mbps	Internet Value Index	Affordability (hours of work needed to pay for internet)
Wyoming	105.23	73.9	1.42
Montana	111.16	80.9	1.37
Alaska	119.52	90	1.33
Iowa	162.19	100.4	1.62
South Dakota	189.22	117.9	1.6
New Mexico	125.74	122.3	1.03
West Virginia	171.87	135.1	1.27
Maine	200.39	143.5	1.4
Arkansas	158.51	158.2	1
Idaho	140.68	159.6	0.88
Mississippi	177.39	167	1.06
Vermont	142.46	167.1	0.85
Wisconsin	201.29	172.4	1.17
Kentucky	210.34	181.4	1.16
Alabama	208.64	186.2	1.12
Hawaii	225.93	189	1.2
Georgia	188.13	213.5	0.88
Indiana	201	214.9	0.94
Illinois	187.39	218.7	0.86
North Carolina	231.41	222.1	1.04
New Jersey	234	222.2	1.05
Pennsylvania	205.02	223.2	0.92
Michigan	201.51	223.7	0.9
California	226.89	232.2	0.98
Minnesota	183.47	233.9	0.78
Tennessee	230.27	236	0.98
Oregon	195.65	238.6	0.82
North Dakota	210.37	239.7	0.88
Utah	213.17	250	0.85
Connecticut	244.23	251.2	0.97
Delaware	237.42	257.5	0.92
Texas	224.77	258.4	0.87
Colorado	199.92	261.2	0.77
Washington	188.77	263.1	0.72
Maryland	226.61	270.9	0.84
Florida	238.3	273.1	0.87
New Hampshire	234.5	277.9	0.84
South Carolina	224.52	287.8	0.78
Ohio	216.68	288.1	0.75
Oklahoma	188.31	288.6	0.65
Louisiana	198.44	289.4	0.69
New York	226.13	291.8	0.77
Massachusetts	225.76	319.8	0.71
Nebraska	199.43	326.8	0.61
Kansas	210.49	331.6	0.63
Missouri	207.74	337.1	0.62
Arizona	199.23	345.5	0.58
Nevada	228.69	362.7	0.63
Virginia	213.82	385.7	0.55
Rhode Island	257.48	463.6	0.56

Methodology: The study ranks US states by comparing internet speed, monthly cost, and local wages to find where people pay the most for the least. Affordability was measured by how much of a person’s wage goes toward their internet bill. Value came from dividing speed by that affordability score.

Read next: Crypto Search Surge Places New York at the Forefront of U.S. Digital Currency Interest
by Irfan Ahmad via Digital Information World

From OpenAI's o3 to Grok-3 Vision: These AI Models Took the Mensa Test, Results May Surprise You

A recent intelligence benchmark has placed today’s most advanced AI models under the same kind of cognitive scrutiny used to assess exceptional human thinkers, and the outcome tells a story of contrasts between raw verbal reasoning and multimodal complexity.

The data comes from the Mensa Norway IQ test, a well-known measure of high-level reasoning, where scores above 130 often mark out genius-level ability. Although the test was designed for people, researchers have begun using it to compare how artificial intelligence systems perform when asked to solve the same kinds of abstract problems humans struggle with.

At the top of the current rankings sits OpenAI’s o3, which scored 133, just shy of the upper boundary of human IQ scales. Not far behind is Gemini Thinking, Google’s language-focused model, which reached 128. These results suggest that, at least in abstract problem solving through words and logic, some AI systems are not just matching human performance but quietly exceeding it.

The upper tier includes OpenAI’s o4-mini with a score of 126, Gemini Pro at 124, and both Claude-4 Opus and Claude-4 Sonnet tied at 118. Even models just below this line, like Grok-3 Think (111), Llama-4 (107), and DeepSeek-R1 (105), are operating within or above the average human range.

Also read: ChatGPT Usage Statistics: Numbers Behind Its Worldwide Growth and Reach (June, 2025)

But the drop-off begins sharply as models shift from text-only processing to visual capabilities. Systems like Claude-4 Sonnet Vision, GPT-4.5, Grok-3, and deepseek-v3, all scoring 97, sit right at the border of human average. Just below them, Gemini Pro Vision landed at 96, while GPT-4 Omni (Verbal) trailed at 91, despite its verbal focus.

OpenAI’s o4-mini-high reached 90, but the decline continues. Visual variants such as o3-vision and Bing’s AI scored 86, followed by Mistral (85) and Claude-4 Opus Vision (80). Further down the list, models like OpenAI o1-pro Vision (79) and Llama-3 Vision (70) show a widening gap between multimodal ambition and actual performance on reasoning tasks.

At the lowest end sit GPT-4 Omni Vision and Grok-3 Think Vision, managing only 63 and 62 respectively — scores that, in human terms, would reflect severe limitations in pattern recognition and logic.

What becomes clear through this ranking is that text-based reasoning remains AI’s strong suit. Models trained purely on language continue to outperform their multimodal counterparts when faced with symbol-based puzzles and logic problems. While vision-enabled AIs might be better suited for real-world perception, they appear less capable when reasoning is abstracted from context and stripped to logic alone.

These findings underscore a split in the development arc of artificial intelligence. Verbal models are now working at, and sometimes above, human cognitive levels. But giving machines the ability to “see” doesn’t yet mean they understand. At least not in the ways intelligence is traditionally measured.

Category	Mensa Norway IQ Test Score
OpenAI o3	133
Gemini Thinking	128
OpenAI o4-mini	126
Gemini Pro	124
Claude-4-Opus	118
Claude-4-Sonnet	118
Grok-3-Think	111
Llama-4	107
DeepSeek-R1	105
OpenAI o1-pro	102
Average Human	100
Claude-4-Sonnet-Vision	97
gpt-4.5	97
deepseek-v3	97
Grok-3	97
Gemini Pro (Vision)	96
GPT4 Omni (Verbal)	91
OpenAI o4-mini-high	90
OpenAI o3-vision	86
Bing	86
Mistral	85
Claude-4-Opus-Vision	80
OpenAI o1-pro-vision	79
Llama-3 (Vision)	70
GPT4 Omni (Vision)	63
Grok-3-Think-Vision	62

H/T: Trackingai.

Read next: Context, Emotion, and Biology: What AI Misses in Language Comprehension
by Irfan Ahmad via Digital Information World

Thursday, June 12, 2025

Context, Emotion, and Biology: What AI Misses in Language Comprehension

As meaning-makers, we use spoken or signed language to understand our experiences in the world around us. The emergence of generative artificial intelligence such as ChatGPT (using large language models ) call into question the very notion of how to define “meaning.”

One popular characterization of AI tools is that they “understand” what they are doing. Nobel laureate and AI pioneer Geoffrey Hinton said: “What’s really surprised me is how good neural networks are at understanding natural language — that happened much faster than I thought …. And I’m still amazed that they really do understand what they’re saying.”

Hinton repeated this claim in an interview with Adam Smith , chief scientific officer for Nobel Prize Outreach. In it, Hinton stated that “ neural nets are much better at processing language than anything ever produced by the Chomskyan school of linguistics .”

Chomskyan linguistics refers to American linguist Noam Chomsky’s theories about the nature of human language and its development. Chomsky proposes that there is a universal grammar innate in humans, which allows for the acquisition of any language from birth.

I’ve been researching how humans understand language since the 1990s, including more than 20 years of studies on the neuroscience of language. This has included measuring brainwave activity as people read or listen to sentences . Given my experience, I have to respectfully disagree with the idea that AI can “understand” — despite the growing popularity of this belief.

Generating text

First, it’s unfortunate that most people conflate text on a screen with natural language. Written text is related to — but not the same thing as — language.

For example, the same language can be represented by vastly different visual symbols. Look at Hindi and Urdu, for instance. At conversational levels, these are mutually intelligible and therefore considered the same language by linguists . However, they use entirely different writing scripts. The same is true for Serbian and Croatian . Written text is not the same thing as “language.”

Next let’s take a look at the claim that machine learning algorithms “understand” natural language. Linguistic communication mostly happens face-to-face, in a particular environmental context shared between the speaker and listener, alongside cues such as spoken tone and pitch, eye contact and facial and emotional expressions.

The importance of context

There is a lot more to understanding what a person is saying than merely being able to comprehend their words. Even babies, who are not experts in language yet, can comprehend context cues .

Take, for example, the simple sentence: “I’m pregnant,” and its interpretations in different contexts. If uttered by me, at my age, it’s likely my husband would drop dead with disbelief. Compare that level of understanding and response to a teenager telling her boyfriend about an unplanned pregnancy, or a wife telling her husband the news after years of fertility treatments.

In each case, the message recipient ascribes a different sort of meaning — and understanding — to the very same sentence.

In my own recent research , I have shown that even an individual’s emotional state can alter brainwave patterns when processing the meaning of a sentence. Our brains (and thus our thoughts and mental processes) are never without emotional context , as other neuroscientists have also pointed out .

So, while some computer code can respond to human language in the form of text, it does not come close to capturing what humans — and their brains — accomplish in their understanding.

It’s worth remembering that when workers in AI talk about neural networks, they mean computer algorithms, not the actual, biological brain networks that characterize brain structure and function. Imagine constantly confusing the word “flight” (as in birds migrating) versus “flight” (as in airline routes) — this could lead to some serious misunderstandings!

Finally, let’s examine the claim about neural networks processing language better than theories produced by Chomskyan linguistics. This field assumes that all human languages can be understood via grammatical systems (in addition to context) , and that these systems are related to some universal grammar.

Chomsky conducted research on syntactic theory as a paper-and-pencil theoretician. He did not conduct experiments on the psychological or neural bases of language comprehension. His ideas in linguistics are absolutely silent on the mechanisms underlying sentence processing and understanding.

What the Chomskyan school of linguistics does do, however, is ask questions about how human infants and toddlers can learn language with such ease , barring any neurobiological deficits or physical trauma.

There are at least 7,000 languages on the planet , and no one gets to pick where they are born. That means the human brain must be ready to comprehend and learn the language of their community at birth.

Regardless of where a child is born, the human brain is capable of acquiring any language.(Unsplash/tommao wang), CC BY

From this fact about language development, Chomsky posited an (abstract) innate module for language learning — not processing. From a neurobiological standpoint, the brain has to be ready to understand language from birth.

While there are plenty of examples of language specialization in infants , the precise neural mechanisms are still unknown, but not unknowable. But objects of study become unknowable when scientific terms are misused or misapplied. And this is precisely the danger: conflating AI with human understanding can lead to dangerous consequences.

This post was originally published on TheConversation.

by Web Desk via Digital Information World

Everyday Habits That Quietly Threaten Your Hearing Health

Many people could be putting their hearing at risk without even knowing it, just by doing the same things they do every day.

From vacuuming the living room to listening to music on a morning commute, common habits are being flagged by sound experts as potential sources of long-term hearing loss. Although the effects are often gradual, the damage can build up over time.

Experts from DECIBEL warn that once hearing is damaged, it doesn’t tend to come back. The Royal National Institute for Deaf People estimates that nearly 18 million people in Britain live with some level of hearing loss, much of it preventable.

Hairdryers and Hoovers

Some of the biggest culprits can be found in almost every home. Hairdryers, often held close to the ear, can hit between 80 and 90 decibels. That’s not far off the noise level of city traffic, and if used daily, it adds up. One suggestion is to reduce how often you wash your hair, or to take breaks when drying, especially if your model doesn’t come with any noise-reducing features.

Vacuum cleaners tend to hover between 70 and 85 decibels. While that might seem harmless, cleaning for long stretches without ear protection can still do harm. Standing further back from the machine, or limiting usage time, can help ease the load on your ears.

Blenders are even louder. Some can briefly reach 100 decibels, the sound level of a motorcycle, and though they’re only used in short bursts, regular exposure can take its toll. Experts advise stepping back from the counter or wearing basic earplugs when using one repeatedly.

Tools That Talk Back

Out in the shed or garden, the noise doesn’t let up. Petrol lawnmowers and electric saws are both known for their roar. In fact, some power tools can go well past 100 decibels. Just 15 minutes of use may be enough to start damaging the tiny hairs inside the inner ear that are vital for hearing.

To reduce the risk, it’s wise to take breaks, avoid working in confined spaces, and wear proper ear protection, not just the foam plugs, but over-the-ear defenders if possible.

Turning It Up Too Loud

Music is often seen as a comfort, but turning the volume up to block out the outside world can backfire. Using headphones at high volumes, especially in noisy environments like trains or gyms, increases the risk of hearing loss. The sound waves are delivered straight into the ear canal, and when the volume exceeds safe levels, they can damage the delicate hair cells that send sound signals to the brain.

There’s also evidence that overexposure to loud music can affect the way nerves transmit those signals, making it harder to understand speech even when no damage appears on a hearing test.

To reduce the harm, listeners are encouraged to use noise-cancelling headphones, take five-minute breaks every hour, and keep volume settings below 60 percent. Many phones give alerts when the sound level climbs too high, it’s worth paying attention to those warnings.

At concerts or clubs, the same rules apply. Earplugs designed for music fans don’t block out the experience, but they do take the edge off. It also helps to stay away from the speakers and step outside for fresh air now and again.

On the Road

Another source of noise that often gets overlooked is driving at speed with the windows down. The wind, tyre noise and engine rumble can easily push sound levels beyond 85 decibels, especially on high ways and motorways. Over long journeys, that exposure builds up.

The solution is simple, keep windows shut when driving fast or in heavy traffic, and avoid drowning out the road noise with even louder music. For those on motorbikes or bicycles, helmets with built-in sound protection are a smart investment.

A Word on Cotton Buds

While not a noise issue, the use of cotton buds is another habit that can affect hearing. Many people still clean their ears this way, but it often does more harm than good. Pushing wax deeper inside can cause blockages, while scraping the sensitive skin may lead to infections or even a perforated eardrum.

Ears usually clean themselves, so there’s no need to dig deep. Putting anything too far inside can push wax further in or cause damage, so it’s best to be gentle and stick to cleaning only the outer part.

Small Changes, Big Difference

Protecting your hearing doesn’t mean avoiding all noise, just being smarter about how much and how often you're exposed to it. Simple steps like turning the volume down, stepping away from noisy appliances, or wearing proper protection can make a lasting difference.

Nature, too, offers a quieter rhythm — one that reminds us that not every moment needs to be filled with noise. Learning to listen more carefully, and more gently, could be the best habit of all.

Image: DIW-Aigen

Read next: Meta’s AI App Shows a Side of the Internet That Few Asked to See
by Irfan Ahmad via Digital Information World

Wednesday, June 11, 2025

Wikipedia Halts AI Summary Trial After Editors Raise Concerns

Wikipedia has paused a trial that used artificial intelligence to write short summaries of its articles, following complaints from volunteer editors.

As per 404Media, the summaries were part of an experimental feature made available earlier this month to users who had a special browser extension and had chosen to take part. They appeared at the top of articles, but were hidden behind a click and marked with a yellow label reading “unverified”.

However, the test was met with immediate pushback from within the Wikipedia community. Editors said the summaries could mislead readers or contain errors, potentially damaging the website’s credibility.

One of the main concerns was the risk of AI producing inaccurate information — a problem often referred to as "hallucination", where the software invents facts or misrepresents them. Other publishers, including Bloomberg, have faced similar issues. Some have had to correct mistakes or scale back their own AI experiments as a result.

Wikipedia has said the feature is now on hold but hasn't ruled out the use of AI entirely. The Wikimedia Foundation, which oversees the platform, says it is still exploring how AI might help make the site more accessible — but insists any future tools must be accurate and trustworthy.

Image: DIW

Read next:

• Crypto Search Surge Places New York at the Forefront of U.S. Digital Currency Interest

• New WhatsApp Feature Summarizes Unread Chats Using Local AI, Bypassing Cloud-Based Data Handling
by Irfan Ahmad via Digital Information World