Google Ai Edge Gallery allows you to use small language models locally (SLM). The data remains on your smartphone.
The promises of local AI are starting to materialize. Thanks to the optimization of model publishers, it now becomes possible to use generative AI locally on your own phone. With only a few RAM gigabytes, the latest Android smartphones can infer (execute) models from one to a few billion parameters. Google, which recently presented a family of SLM adapted to mobiles, now offers a dedicated mobile application to question its models locally: AI Edge Gallery.
An application based on Gemma 3N
Ai Edge Gallery uses Gemma 3N and Gemma 3 (1b) models developed by Deepmind. Thanks to software optimizations, despite its 5 to 8 billion total parameters, Gemma 3N consumes only 2 to 3 GB of memory. Performances all the more remarkable as the benchmarks place the model at the top of the basket for models of equivalent size. On the text Arena (Lmarena), the model is higher in the classification at Claude 3.5 Haiku or Amazon Nova. Even more interesting, Gemma 3n accepts as a text, images, video and audio entry.
The application is currently available only on Android, via the Play Store or by downloading the APK file directly from the restitory github. To use the application, it will be necessary to have Android 12 or a higher version. The app mainly allows you to question visual elements, transcribe and analyze audio files, use a source document or simply dialogue with AI using text. A minimum of 4 to 6 GB of RAM is necessary, and a space of 0.5 to 4.7 GB depending on the model you choose.
How to use Gemma 3n on Ai Edge Gallery?
Once the application is downloaded, at the first opening, Ai Edge Gallery requires downloading the Gemma models. They are not natively integrated. The application supports, in September 2025, three models: Gemma3-1b-IT (550 Mb approximately), for the textual part only and Gemma-3N-E2B-IT (3.4 GB) and Gemma-3N-E4B-IT (4.7 GB) to analyze images or audio files. Downloading the weights of the model has been done from Hugging Face. You will have to identify on Hugging Face (or create an account). Then you will have to accept the conditions of use of the Gemma models, always on Hugging Face. Once the form is completed, the download automatically starts in the application.
Cat, image analysis, audio file analysis…
Once the model is downloaded, it is possible to use it directly in chat mode, for image or audio file analysis. As part of this test we use a Google Pixel 10 (with 12 GB of RAM). Ai Edge Gallery allows you to configure the topk, the topp, the temperature, and the inference chip (CPU or GPU). For this test, we will use the default values of the topk and Topp and a temperature of 0.8. Finally, according to our tests on Pixel, inference on CPU is faster.
Gemma-3N-E2B-IT more precise than Gemma3-1b-IT
The AI can, for example, be very useful for responding to an SMS or an email in complete confidentiality. For example, we ask Gemma-3N-E2B-IT to respond to an email.
Gemma-3N-E2B-IT responds fairly quickly (around 144 tokens per second) but the phone tends to heat. The answer proposed by the AI to our email is rather correct even if we observe better results with prompt English. The model, by its limited size, is undoubtedly better in the language of Shakespeare. For best results, it is preferable to use Gemma-3N-E2B-IT, with more parameters. The response latency is longer but the result is only better.
On-device image analysis
Even more useful, Ai Edge Gallery allows you to analyze images on the fly. The visual analysis will not be as precise as with an LLM but the results are there. OCR, description, graph analysis …
For example, we ask Gemma-3N-E2B-IT to analyze the nutrients present in a drink from a photograph. The AI offers a coherent result in less than a minute. The generation is slightly slower but does not degrade excessively the experience.
Transcribing or summarizing an audio file
This is a use case that would not have been possible a few years ago. Ai Edge Gallery allows you to treat, with a SLM, any audio file. Only one limit: the length. By default, the application only allows you to process audio files for 30 seconds. Presumably, the context size of the model must be limited to avoid saturating the memory of the phone.
For example, we ask the AI to quickly summarize in the form of a bullet point a voice note. The result is there, the instruction is perfectly respected. We simply regret the length limits imposed.
The promises of the IA is device are respected
Ai Edge Gallery is a particularly useful application, whether to test the capabilities of AI on its smartphone or to use the generative AI in total security. The approach is ahead makes sense in situations where confidentiality is vital or in environments without connectivity (planes, underground, white areas). The data remains on the device.
Despite some limitations (restricted audio duration, variable performance depending on the language, latency) the application is generally very promising. Especially when we know that I Edge Gallery was initially designed to “inspire and allow developers to create experiences fueled by models of IA on Device” and not for consumer use.




