The new llama.cpp graphical interface allows you to configure a ChatGPT like interface locally on your computer in just a few minutes.
Will 2026 be the year of on-device AI? The end of 2025 saw its share of tools for creating agents and chatbots running locally on your own machine. Among them, the new interface of the famous llama.cpp inference library has been completely recoded to be easy to use with excellent performance, whatever the device used. The WebUI (graphical interface) of llama.cpp has, in fact, been completely rewritten by Georgi Gerganov, pillar of the open source community on GitHub. In addition to the change in interface (simpler) and the improvement in overall performance (thanks to the change in development language), the new version integrates advanced file management, support for reasoning, and the use of branches for conversations. How to install and use it? Tutorial.
1. Install llama.cpp
The first requirement for using WebUI is the installation of llama.cpp (for those who have never installed it). Installation is quite simple, on Windows, Mac and Linux.
- For Windows, you must use the WinGet package manager with the command: winget install llama.cpp in your terminal (CMD).
- For Apple (our case) and Linux, the simplest is to use the Homebrew manager: brew install llama.cpp
Installation is quick and takes a few minutes, depending on your internet speed.
2. Downloading the template and launching the server
Once llama.cpp has been installed, the package already contains all of the WebUI code. The llama.cpp team accepted the merge from Georgi Gerganov, and the package therefore contains everything necessary, no additional installation required.
Only one command is needed to download, infer the model and then launch the web server. Here it is: llama-server -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF –host 127.0.0.1 –port 8033
For this test we use LFM2.5-1.2B-Instruct-GGUF. But you can definitely use another model depending on the size of the VRAM available on your machine. Simply replace LiquidAI/LFM2.5-1.2B-Instruct-GGUF with the name of the Hugging Face repository of your model (often the name of the editor or the nickname of the user who optimized the weights) followed by the name of the chosen model. Please note, you absolutely need a model with weights in GGUF format. To choose the model suited to your machine, do not hesitate to read our dedicated article (LLM locally: how to choose the right hardware configuration?).
Finally, if you use a remote server (VPS, dedicated server, etc.), it is also possible to change the IP address and port of the web server, for example to expose the WebUI interface externally with 0.0.0.0 (be careful to configure a firewall and a reverse proxy or a limitation by IP address).
3. Use the interface, from the web
Once the command is sent, the llama WebUI interface is available locally at: http://127.0.0.1:8033. It’s that simple.
Once on the server web page, the interface is displayed. The latter is very close to ChatGPT, ideal for people used to simple and functional interfaces. It is possible to enter a prompt but also to send PDFs, text documents, images or even voice files as long as the chosen model is multimodal.
Your model will then primarily use the document provided to respond. All locally, securely and fairly quickly depending on the size of the model. With LFM2.5-1.2B-Instruct-GGUF, a model of 1.2 billion parameters on our Mac Mini with 24 GB of RAM, we obtain a speed of between 80 and 100 tokens per second, which is much faster than GPT-5 Instant or Gemini 3 Flash (obviously the performances are absolutely not comparable) for word processing.
The interface also offers basic settings to optimize the model’s responses: temperature, Top K, Top P, etc. Finally, all conversations are recorded, like on ChatGPT, Claude or Gemini, in the left part of the interface and stored locally on your computer.
A basic but functional interface
llama.cpp’s WebUI checks all the boxes for users who don’t want complicated installations and configurations. It’s simple, functional, and above all, the data remains on the PC. It is even possible to automate server startup under Windows. Just create a .bat file with the command llama-server -hf ggml-org/Ministral-3-3B-Reasoning-2512-GGUF –host 127.0.0.1 –port 8033place it in the Windows startup folder (accessible via Win+R then shell:startup), and add http://127.0.0.1:8033 to your favorites. Result: your AI server starts automatically with your PC and remains permanently accessible with a simple click in the browser. Practical.
The only downside is that the interface does not yet offer native support for the MCP protocol or preconfigured tools such as web search or access to external APIs, unlike other solutions like LM Studio. For now, WebUI focuses on the essentials: chatting with the model and managing files. But nothing prevents these features from being implemented subsequently, especially given the activity of the open source community around llama.cpp. In the meantime, for local, secure use without dependence on the cloud, WebUI already does the job very well.




