The Continue extension for VS Code, combined with Ollama, allows you to have your own offline coding agent. Rather promising results even if it depends on the power of your machine.
Continue is an installable extension in VS Code that allows you to code offline and for free. It reads your code, writes new code and executes commands when you tell it what to do in French or English. No data from your projects is transmitted outside your computer, guaranteeing the confidentiality of your projects.
Prerequisites
In order to use the Continue extension, two software programs, Ollama and Visual Studio Code, must first be installed. Ollama will allow us to run AI models locally and VS Code to create code. It is on this interface that we will be able to use an AI model specialized in coding.
Go to https://ollama.com and download the version corresponding to your operating system (Windows, macOS or Linux). Once the download is complete, launch the installer and follow the instructions.
Download VS Code from https://code.visualstudio.com and install it on your machine.
Installation
In VS Code, click “Extensions” and install the “Continue” extension. Once the extension is installed and opened, a configuration window should appear. Select the “Local” option. If Ollama is correctly installed and running, Continue will offer you to download several AI models via command lines. You can choose to download all the suggested templates or select the ones that best suit your needs. Once the models are installed, they will be usable via the application but also directly on Ollama via its chat interface.
If the recommended models can evolve over time depending on performance, the models are divided into three categories: chat models which are optimized for conversations (like Llama), “autocomplete” models designed for real-time text completion (here code, with Qwen Code) and embedding models which generate digital vectors from text, facilitating the work of the AI in the background to understand the context of your code
Choose the right model
When using a model for code generation or completion, you must choose the size or dimension of the model as well as certain execution parameters according to your needs and the resources you have available. Models such as Qwen2.5-Coder illustrate this flexibility well: version 7B (7 billion parameters) is ideal for performance, but requires around 5-8 GB of VRAM/RAM. If you prefer speed or have less memory, the 1.5B version is a lighter alternative.
The larger 32B model comes close to the performance of GPT-4o, but requires more hardware resources. If your machine has a compatible GPU (NVIDIA CUDA or AMD ROCm), make sure Ollama uses it for model inference, as GPU acceleration significantly improves build speed.
The different functionalities
Continue offers four main modes:
- L’Autocomplete provides intelligent, contextual code suggestions directly as you type your code. It anticipates what you want to write and suggests completions that you can accept by clicking the “Tab” key. To use it effectively, give clear function names and add explanatory comments to provide maximum context to the AI.
- The mode Edit allows you to make precise changes to specific sections of code. Simply select the code you want to change, press “Cmd/Ctrl + I” and describe the desired change (for example, “make this code more readable”). Continue will then show you the proposed changes, which you can accept or reject.
- The mode Cat is an AI assistant that can analyze your code and answer your questions. You can ask them to explain an algorithm, suggest optimizations, write tests, or even brainstorm solutions to specific problems. To interact with Chat mode: press “Cmd/Ctrl + L”.
- The mode Agent is a standalone coding assistant capable of reading files, making changes, running commands, and handling complex, multi-step tasks. Agent mode analyzes your existing code, creates the necessary files, writes the code, manages configurations, and walks you through each step of its process.
The practical test
We are going to test the creation of a small interactive application. Here is the prompt:
Génère une application web de productivité. L’application doit contenir : Un gestionnaire de tâches : on peut ajouter des tâches, les cocher et les supprimer. Les tâches doivent être sauvegardées localement (localStorage) Un Timer Pomodoro : un bouton Start/Pause, un bouton Reset, et une alternance automatique entre 25 minutes de travail et 5 min de pause. Un Design moderne et épuré : utilise des couleurs professionnelles (bleu/gris), des polices sans-serif, et assure-toi que l'app est centrée et jolie.
In less than 4 minutes, Continue generated three working files. This processing time was fast for this test which only included a few dozen lines of code.
The Continue extension, combined with Ollama and templates like Qwen Code, has good efficiency. It offers a free and confidential AI coding tool, and this can also be relevant for local data analysis. The limit is mainly linked to the power of your computer. The more powerful it is, the more powerful models you will be able to run, which will considerably improve the speed and relevance of the generated code.




