Monday, November 10, 2025

Study Finds AI Can Mimic Grammar but Not Feeling in the Way Humans Communicate Online

Artificial intelligence has become fluent in nearly every structured task it touches. It can compose essays, generate code, and even craft marketing slogans with uncanny precision. Yet, when it steps into the messy world of online discussion, it still sounds slightly off.

A new international study reveals that large language models, despite their sophistication, continue to struggle with the one quality that defines human communication: emotion.

Researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University with the help of “computational Turing test” tried to measure how human-like AI text really is. Instead of relying on people’s guesses, they used algorithms to compare linguistic and emotional features across thousands of social media posts from Reddit, Bluesky, and X. The team tested nine open-weight models from families such as Llama, Mistral, DeepSeek, Qwen, and Apertus to see whether machines could truly replicate human tone and spontaneity. The results were clear: AI-generated replies were detected as artificial in 70 to 80 percent of cases, even after applying advanced optimization techniques.

The findings highlight a recurring pattern in today’s large models, they can imitate structure but not sentiment. AI posts tend to follow grammatical rules flawlessly yet sound emotionally muted. They respond with balanced politeness where humans lean into sarcasm, irritation, or humor. Toxicity, often a hallmark of authentic online talk, became the most reliable clue separating people from machines. When classifiers analyzed tone, emotionally flat or overly polite replies nearly always turned out to be AI-made.

Across the three platforms, the differences remained visible. The models performed slightly better on X, where short and stylized exchanges dominate, but stumbled on Reddit, where argument, humor, and personal storytelling mix in unpredictable ways. Bluesky fell somewhere in between. The research also found that smaller non-instruction-tuned versions such as Llama 3.1 8B and Mistral 7B reproduced human style more effectively than fine-tuned variants designed for politeness or safety. Larger models, including Llama 70B, showed no consistent advantage, suggesting that scaling alone doesn’t bring emotional realism.

One of the most striking aspects of the study lies in how emotion and meaning pull in opposite directions. When researchers optimized the models for lower detectability, their semantic similarity with real human replies dropped. In other words, the more human the responses sounded, the less they actually matched what people said. Models trained to be accurate lost some of their conversational rhythm, while those fine-tuned for style began drifting from factual or contextually correct replies.

Attempts to close this gap through prompt design and fine-tuning didn’t produce the expected improvements. Complex strategies such as persona prompting, contextual retrieval, and fine-tuning often made text more uniform and easier to identify as machine-generated. Simple adjustments worked better. Providing stylistic examples or short snippets of authentic replies helped the models capture certain nuances of user language. Even then, emotional expressiveness (especially sarcasm and empathy) remained beyond their reach.

The research also uncovered subtle linguistic fingerprints that persist even after optimization. Average word length, lexical variety, and sentiment polarity continued to separate AI text from human writing. These markers changed shape across platforms, but the emotional gap held steady. When emotion-related terms such as “affection,” “optimism,” or “anger” appeared, they followed mechanical patterns rather than the fluid shifts seen in human exchanges.

For ordinary readers, these findings explain why AI comments often feel too polished, cautious, or context-blind. They mirror the syntax of online talk without its volatility. That distinction makes AI-generated dialogue easy to spot, even without expert tools. For developers, the study underlines a deeper limitation, current models excel at copying the form of communication but not its intention. True human language involves affective tension, inconsistency, and risk, all qualities machines still handle poorly.

The Zurich-led team’s conclusion is both reassuring and sobering. It shows how far natural language systems have come and how far they remain from sounding truly alive. Despite billions of parameters and countless training samples, today’s chatbots cannot convincingly reproduce the emotional unpredictability of human conversation. They have mastered grammar, but feeling remains out of reach. And for now, that gap ensures the internet still sounds unmistakably human.


Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next:

• 2025’s Most Common Passwords Show Users Still Haven’t Learned the Cybersecurity Basics

• Scrolling Without Thinking: Data Shows TikTok’s Ease and Accuracy Fuel Addictive Engagement
by Asim BN via Digital Information World

No comments:

Post a Comment