This almost too simple prompting technique improves all LLMs (and Google has proven it)

This almost too simple prompting technique improves all LLMs (and Google has proven it)

Three researchers from Google Research tested a new prompting technique; the results showed gains in precision in 67% of cases, without additional cost or latency.

What if there was a simple, almost too simple, trick to make LLMs significantly more precise? And this without it taking more time or costing more? This is not a dubious marketing promise but a new prompting technique tested by three researchers from Google Research and developed in a research paper. The results obtained are clear: repeating the instructions twice within a prompt drastically improves the performance of all the LLMs tested. All without the generation taking more time or changing the format of the response. Explanation.

The problem with Transformer-based LLMs

Models like GPT, Gemini, Claude or Mistral are all based on the Transformer architecture. Their principle? Read and generate text token by token, from left to right. During text generation, each new token produced can only build on previous tokens. Even during the initial processing of the prompt, the order of the tokens influences how the model constructs its internal representations. Concretely, if you write a prompt like “ “, the model first processes the context and then discovers the question. With “ “, it’s the opposite. It reads the question before having the complete context. And this order of processing directly impacts the quality of the predictions.

Of course, there are alternatives. Diffusion models for text, for example, do not have this constraint, they generate the entire response iteratively rather than sequentially. But these architectures still remain largely experimental (Gemini Diffusion, for example) for text generation.

Faced with this observation, three researchers from Google Research tested a disconcertingly simple approach: repeating the instructions in the prompt twice. The idea is that by repeating the prompt, the model can make connections between all elements, regardless of their initial position. An element at the beginning of the first occurrence is also found at the end of the second. The model processes information from several angles. Simple, but effective.

Tangible results in benchmarks

To measure the effectiveness of this prompting technique, the researchers applied it to 7 different models: Gemini 2.0 Flash, Gemini 2.0 Flash Lite, GPT-4o mini, GPT-4o, Claude 3 Haiku, Claude 3.7 Sonnet and Deepseek V3, using the providers’ API for two months. The researchers evaluated each model on 7 different benchmarks: ARC, OpenBookQA, GSM8K, MMLU-Pro and MATH and two benchmarks created for the occasion, NameIndex and MiddleMatch.

Of all the tests carried out, prompt repetition outperforms the classic method in 67% of cases. More interestingly, in no case presented did repeating instructions deteriorate the model’s performance on a benchmark. And some results are spectacular. Gemini 2.0 Flash-Lite jumps from 21.33% to 97.33% accuracy on NameIndex (benchmark where specific information must be extracted from the middle of a long list). All models are progressing. GPT-4o mini, Claude 3.7 Sonnet, Deepseek V3… the technique works everywhere.

The only limit? Models of reasoning. When using the chain of thought (CoT), the effectiveness of the technique is significantly reduced, or even degrades the performance of the model in certain cases. For researchers, this is explained by the fact that the CoT already spontaneously repeats the prompt in its reasoning, making the technique redundant. An observation which opens an interesting avenue: what if this ability to repeat the prompt partly explained why the CoT offers drastically superior results to the standalone approach? Researchers don’t make this connection, but the data suggests it.

Repeat your prompts, except with reasoning models

Concretely, how to apply this technique? The simplest template is to copy and paste your prompt twice. “PROMPT>“. The researchers also tested more explicit variants, such as adding “Let me repeat that:” between the two occurrences, without any notable difference in performance. On the other hand, they observed that a triple repetition gives significantly better results on certain specific tasks, in particular the extraction of information in long lists. But for the majority of use cases, simple repetition is more than sufficient.

The technique has a major interest, its simplicity. No need to review its architecture, fine-tune a model or modify its code. The gains are immediate and measurable. Google researchers aren’t promising anything revolutionary, just an engineering trick that works. Even in AI, the most effective solutions are also the simplest.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment