Friday, January 16, 2026

Study Finds Prompt Repetition Improves Non-Reasoning LLM Performance Without Increasing Output Length or Latency

A study by researchers at Google Research reports that repeating an input prompt improves the performance of several large language models when they are not using reasoning, without increasing the number of generated tokens or measured latency in the reported experiments.

The findings are presented in a December 2025 arXiv preprint titled “Prompt Repetition Improves Non-Reasoning LLMs” by Yaniv Leviathan, Matan Kalman, and Yossi Matias. The paper is released as a preprint and is available under a Creative Commons Attribution 4.0 license.

The authors define prompt repetition as transforming an input from "<QUERY>" to "<QUERY><QUERY>". According to the paper, “when not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.”

The paper states that large language models “are often trained as causal language models, i.e. past tokens cannot attend to future tokens.” As a result, the authors note that “the order of the tokens in a user’s query can affect prediction performance.” The study reports that repeating the prompt “enables each prompt token to attend to every other prompt token,” which the authors state addresses this limitation.

The experiments evaluated seven models: Gemini 2.0 Flash, Gemini 2.0 Flash Lite by Google, GPT-4o, GPT-4o-mini by OpenAI, Claude 3 Haiku, Claude 3.7 Sonnet by Anthropic, and DeepSeek V3. All tests were conducted using each provider’s official application programming interface (API) in February and March 2025.

The models were tested on seven benchmarks: ARC (Challenge), OpenBookQA, GSM8K, MMLU-Pro, MATH, and two custom benchmarks: NameIndex and MiddleMatch. For multiple-choice benchmarks, the paper reports results for both question-first and options-first prompt orders.

When reasoning was disabled, the authors report that “prompt repetition improves the accuracy of all tested LLMs and benchmarks.” Using the McNemar test with a p-value threshold of 0.1, the paper reports that “prompt repetition wins 47 out of 70 benchmark-model combinations, with 0 losses.” In simple terms, this means that in 70 different tests, repeating the prompt made the AI perform better 47 times and never once made it perform worse, showing prompt repetition improves accuracy and did not produce any cases where it performed worse.

The study also evaluates efficiency. The authors report that “prompt repetition and its variants do not increase the lengths of the generated outputs or the measured latencies,” with one noted exception. For Anthropic’s Claude models, the paper states that for “very long requests,” latency increased, which the authors attribute to the prefill stage taking longer.

When reasoning was enabled by asking models to think step by step, the paper reports that “prompt repetition is neutral to slightly positive,” with five wins, one loss, and 22 neutral outcomes across the evaluated cases.

The authors note several limitations. They state that prompt repetition “can affect latency for long prompts, and might be impossible for very long ones.” They also caution that measured latencies "might be affected by" factors such as "network delays or transient loads." and that results “should be taken with a grain of salt.”

The paper concludes by stating, “repeating the prompts consistently improves model performance for a range of models and benchmarks, when not using reasoning”, while noting that further research is needed to explore variations and investigate "when repetition is helpful".

Image: DW-Aigen

Notes: This post was drafted with the assistance of AI tools and reviewed, fact-checked and published by humans.

Read next: Small businesses say they aren’t planning to hire many recent graduates for entry-level jobs – here’s why
by Asim BN via Digital Information World

No comments:

Post a Comment