Three researchers from Google Research tested a new prompting technique; the results showed gains in precision in 67% of cases, without additional cost or latency.
What if there was a simple, almost too simple, trick to make LLMs significantly more precise? And this without it taking more time or costing more? This is not a dubious marketing promise but a new prompting technique tested by three researchers from Google Research and developed in a research paper. The results obtained are clear: repeating the instructions twice within a prompt drastically improves the performance of all the LLMs tested. All without the generation taking more time or changing the format of the response. Explanation.
The problem with Transformer-based LLMs
Models like GPT, Gemini, Claude or Mistral are all based on the Transformer architecture. Their principle? Read and generate text token by token, from left to right. During text generation, each new token produced can only build on previous tokens. Even during the initial processing of the prompt, the order of the tokens influences how the model constructs its internal representations. Concretely, if you write a prompt like “
Of course, there are alternatives. Diffusion models for text, for example, do not have this constraint, they generate the entire response iteratively rather than sequentially. But these architectures still remain largely experimental (Gemini Diffusion, for example) for text generation.
Faced with this observation, three researchers from Google Research tested a disconcertingly simple approach: repeating the instructions in the prompt twice. The idea is that by repeating the prompt, the model can make connections between all elements, regardless of their initial position. An element at the beginning of the first occurrence is also found at the end of the second. The model processes information from several angles. Simple, but effective.
Tangible results in benchmarks
To measure the effectiveness of this prompting technique, the researchers applied it to 7 different models: Gemini 2.0 Flash, Gemini 2.0 Flash Lite, GPT-4o mini, GPT-4o, Claude 3 Haiku, Claude 3.7 Sonnet and Deepseek V3, using the providers’ API for two months. The researchers evaluated each model on 7 different benchmarks: ARC, OpenBookQA, GSM8K, MMLU-Pro and MATH and two benchmarks created for the occasion, NameIndex and MiddleMatch.
Of all the tests carried out, prompt repetition outperforms the classic method in 67% of cases. More interestingly, in no case presented did repeating instructions deteriorate the model’s performance on a benchmark. And some results are spectacular. Gemini 2.0 Flash-Lite jumps from 21.33% to 97.33% accuracy on NameIndex (benchmark where specific information must be extracted from the middle of a long list). All models are progressing. GPT-4o mini, Claude 3.7 Sonnet, Deepseek V3… the technique works everywhere.
The only limit? Models of reasoning. When using the chain of thought (CoT), the effectiveness of the technique is significantly reduced, or even degrades the performance of the model in certain cases. For researchers, this is explained by the fact that the CoT already spontaneously repeats the prompt in its reasoning, making the technique redundant. An observation which opens an interesting avenue: what if this ability to repeat the prompt partly explained why the CoT offers drastically superior results to the standalone approach? Researchers don’t make this connection, but the data suggests it.
Repeat your prompts, except with reasoning models
Concretely, how to apply this technique? The simplest template is to copy and paste your prompt twice. “PROMPT>
The technique has a major interest, its simplicity. No need to review its architecture, fine-tune a model or modify its code. The gains are immediate and measurable. Google researchers aren’t promising anything revolutionary, just an engineering trick that works. Even in AI, the most effective solutions are also the simplest.




