Tutorial: How to automate Google Chrome with Nanobrowser and Gemini

Tutorial: How to automate Google Chrome with Nanobrowser and Gemini

Configure the LLM well and present a clear prompt is essential but the use of this chrome extension is simple.

Attempts to make AI a human user, capable of clicking, looking for information, analyzing and seeing in real time what is displayed on their browser multiply. For example, we think of Operator of Open AI, the Computer Agent Use of Anthropic in Claude Ai or Project Mariner in Gemini 2.0. Their impact on productivity and user experience should be significant.

Another promising tool points to its nose: nanobrowser. This Chrome extension offers open source web automation based on AI. It allows you to run multi-agent workflows directly from a browser, Chrome or Edge, depending on the official Github documentation. It offers the possibility of working locally, which is practical for preserving data confidentiality, and a large choice of LLM with which to work. Proof of her nascent notoriety, she also currently has more than 7,000 stars on Github.

A quick installation

The tool is easy to use and install. To set up it, simply install the extension, choose your LLM and ask the model to perform the task. First step, therefore, installation. Go to the Nanobrowser Chrome Web Store page and add the extension to Chrome. We then click on the serrated wheel, at the top right of the extension interface, to access the parameters.

We will configure our LLM in the Models section. For our tests, we will use Gemini. This is interesting, especially for its detection capacity in an image. More technically, as the Mountain View firm notes: “from Gemini 2.5, the models not only detect the elements, but also segment them and provide their outline masks.”

To integrate it in Nanobrowser, we use an API key, available on AI Studio. This is integrated into the Chrome extension by clicking on “+ Add New Provider” then on “Gemini Api Key.”

Once this one is entered, we select the model to use for the various agents: Navigator, Planner, Validator and the Speech-to-Text Model. These collaborate to make complex web work flows. According to our tests, Gemini-2.5-Flash-05-20 is the most effective model, especially in terms of latency. It seems more recommended than Gemini 2.5 Pro in this kind of task, where reactivity and minimum consumption of resources are key elements.

Other elements to work for each model in this section, the temperature and the top P. As on default settings, we select a low temperature for the three models: 0.7 for the planner, 0.3 for the navigator and 0.1 for the validator. Objective: generate specific answers. We lower the top P, high enough in the original settings (0.9 for the planner, 0.85 for the navigator and 0, 8 for the validator) at 0.5 for the planner, the navigator and the validator. The goal: to bet more on accuracy.

A direct and precise prompt

Once these settings have been made, we go to the quick part. From one or more sentences, Nanobrowser is able to perform impressive tasks. Example of a relatively simple prompt: “Find the last article of the JDN”. It allows you to better understand the mechanics behind the work carried out by Nanobrowser and Gemini.

We see that the prompt made in the cat of Nanobrowser brings the entry into play of the various agents. Planner divides the different tasks, Navigator performs actions on the web pages and Validator gives the final result. Their actions are mentioned in the right sidebar. Note that during these, different elements of the web page may reveal color zones and numbers associated with the different sections. This helps in particular the agents to identify the different sections and to better interact with them. Watch out for the quality of the results obtained. In this case, the research was not pushed enough to find the various articles.

Another example of a prompt, a little more difficult, to make purchases in a more enlightened way: “Find a racket aero pure drive not corded at less than 250 euros on the site of Babolat.” The difficulty of the task leads the agents to take about 5 minutes to answer. During the execution of it, we observe that the planner and the navigator work together. When they encounter an obstacle, one can take advantage of and the other takes over.

Note that, according to our tests, dividing a task into the prompt does not necessarily make the process faster. On the other hand, mentioning terms similar to those of the site of the site on which a search can be carried out can help the agents. In our case, we can change the prompt to: “Find a purely driven adult tennis racket not corded at less than 250 euros.”

Another prompt to show the power of the tool: “On the X.com site, likes with my account @bruno_poncet the last post of the jdnebusiness account.” The action is well done, in less than two minutes. However, everything is not perfect. Planner explains on the one hand, rightly, that “JDNebusiness’s last article was liked by the @bruno_Poncet account on X.com. The ultimate task is accomplished.” The validator for its part claims that “the answer is not yet correct. The task is not finished” …

Confidentiality and security to monitor

As we can see, this kind of development tool shows certain shortcomings. In addition to the latency period, or sometimes inaccuracy, pay attention to the confidentiality of data in particular. Beware of the data mentioned and the security of the sites visited.

A recent report on Nanobrowser alerts that, if the actions are in location, sending information potentially sensitive to third -party suppliers raises questions. User data then depends on external business policies like Google or Openai.

Still according to the report, reliability and operationality have high risks. “The project does not have a formal legal entity, official financing or a clear monetization strategy, which raises important questions about its viability, maintenance and long -term support.”

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment