Devin, the code AI that (almost) completely replaces developers

Devin, the code AI that (almost) completely replaces developers

Devin promises a virtual, autonomous and asynchronous software engineer. We tested it. It keeps its promises, but it comes at a price.

It is an AI agent which aims, on paper, to completely replace developers. Launched in June 2024 by the American start-up Cognition (valued at $10 billion), Devin is part of the credo of fully autonomous AI code agents. The agent is designed to carry out high value-added software engineering tasks. Unlike Claude Code, Codex, Gemini CLI or Cursor, the tool presents itself as a virtual software engineer capable of creating a software product from A to Z asynchronously. An approach that is still experimental, but which is already attracting some major American accounts (Microsoft, Goldman Sachs, Nvidia, Cisco) and even the American army.

An asynchronous AI agent

As if Cognition had anticipated the OpenClaw wave, Devin has focused since the start of its development on the asynchronous nature of its agent. When Claude Code and traditional code AI agents require a minimum of human feedback to move forward, Devin is designed to limit human feedback as much as possible. From the start, Cognition wanted to truly replace software engineers. Concretely, when a task is assigned to it, Devin automatically provisions a virtual machine in the cloud, clones the source code repository, installs the dependencies, then gets to work completely independently. The agent is able to read technical documentation to learn about an unfamiliar framework, plan an architecture, write the code, run the tests, and then submit a pull request. Humans are only involved at both ends of the process: at the time of the initial order, and at the final review of the code. A parent agent can even orchestrate child agents and assign them subtasks.

This is the other uniqueness of Devin compared to the competition: Cognition invested very early in the development of its own models. The start-up has launched a family of proprietary models called SWE (for Software Engineering). The latest, SWE-1.6, released in April, was specifically trained by reinforcement learning on real development environments. It is served in partnership with Cerebras at a claimed speed of 950 tokens per second, which is 13 times faster than Anthropic’s Sonnet 4.5, according to Cognition. Added to this is SWE-grep, a model specialized in rapid contextual search within large code repositories.

In practice, the most common use cases range from large-scale code migration (an area where Kiro excels) to the development of new features, including automatic PR review and bug fixing from tickets.

Premium pricing, aligned with promises

Cognition seems to be betting its strategy on the primary profitability of its tool, unlike Anthropic which focuses above all on usage (before probably raising its prices). Devin’s billing is based on a proprietary unit called ACU (Agent Compute Unit), a metric that aggregates virtual machine time, model inference, and network bandwidth consumed by the agent. According to Cognition, one ACU corresponds to approximately 15 minutes of Devin’s active work. Three plans coexist: a pay-as-you-go Core plan (minimum $20 per month, at $2.25 per ACU), a Team plan at $500 per month including 250 ACU (i.e. $2 per unit) with access to the API and unlimited parallel sessions, and an Enterprise plan on quote for large accounts requiring SSO, access control and deployment in their cloud.

Subscriptions Core Team Enterprise
Monthly price $20 $500 Custom made
ACU included /month 0 250 Custom made
Cost per additional ACU $2.25 $2.00 Custom made
Estimated hourly cost ~$9.00 ~$8.00 Custom made
Concurrent sessions 10 Unlimited Unlimited
API access
Devin Wiki/Search
Dedicated support
SSO / RBAC
Private cloud deployment

Broken down in hours, Devin’s cost is between 8 and 9 dollars per hour depending on the formula chosen. A price that Cognition explicitly positions against the hourly cost of a human developer or an offshore service provider. The Team plan, with its 250 monthly ACUs, would offer the equivalent of about 62 hours of guaranteed autonomous work, according to Cognition.

The JDN test

To test the capabilities of this code agent, we will subject it to a simple project: have it develop a smart health application based on generative AI. The goal? Have a health assistant that can be used simply for advice and analysis. We will ask Devin to use the GPT-5.4 model, one of the best generative models in health (Meta Muse Spark is not yet available). We use a Github repository to store the project with a free but limited version of Devin (Trial).

To generate the project, we go directly through the Devin web interface connected to our GitHub account. By default, Devin offers 2 modes: a classic agentic mode and a Fast mode. We use the first to limit the consumption of tokens.

We thus give the agent the following prompt:

# HealthChat – Interface web de santé par IA générative

## Description

Générer une interface web de santé par IA générative. L'utilisateur peut :

- Interroger une IA sur des questions de santé via un chat

- Envoyer des images (radios, photos du corps, etc.) pour analyse visuelle

## Design

- Interface sobre, simple d'usage, moderne

- Mobile-first (utilisable sur smartphone)

- Style messagerie : bulles utilisateur/IA, champ de saisie + bouton ???? pour les images

- Disclaimer permanent en bas : "⚠️ Cet outil ne remplace pas un avis médical. En urgence, appelez le 15."

## Configuration du modèle IA

- Par défaut : clé API OpenAI, modèle `gpt-5.4` (avec support vision pour les images)

- Alternative : accès à un modèle open source via Ollama (URL serveur + nom du modèle configurables)

- Les paramètres sont accessibles via un bouton ⚙️

## Onboarding (première connexion)

A la première utilisation, demander à l'utilisateur de renseigner ses variables rapides :

- Prénom

- Age

- Poids

- Antécédents médicaux (ATCD/MHT)

Ces infos sont stockées en localStorage et modifiables à tout moment.

## Prompt system

Le profil utilisateur (prénom, âge, poids, antécédents) est injecté automatiquement dans le prompt system à chaque échange.

Le prompt system doit imposer à l'IA de :

1. Poser des questions de clarification à l'utilisateur avant de répondre, dans la limite de 5 questions maximum, pour obtenir le maximum de contexte

2. Ne jamais poser de diagnostic, uniquement informer et orienter

3. Si une image est jointe, décrire ce qui est observé sans interpréter comme un diagnostic

## Contraintes techniques

- Stockage local uniquement (localStorage)

- Déployable en statique (Vercel, Netlify, GitHub Pages)

- Images converties en base64 avant envoi API

Once the code is fully generated, Devin tests the mobile and desktop version of our application in his interface. Here again, it’s quite surprising: the agent seems really excellent in computer use.

In less than 10 minutes, the entire project is delivered. Apart from a few minor adjustments via a prompt (adding a README, changing a model, adding a license file, etc.), the final project is operational and production-ready in less than 20 minutes in total. A real record. Devin’s real strength is not to over-solicit the user with questions of all kinds, and to develop in the background, quickly and silently. Devin used 3.45 ACU for this session, for a real cost of 7.76 euros with the Core offer (excluding monthly subscription of 20 dollars). The cost is certainly high, but the final application works perfectly. Installation from the depot took place without any problem, as did use in real conditions.

Our application is available as open source, under the Apache 2.0 license, on GitHub for the most curious: https://github.com/BenjaminPolge/HealthChat. Some UX modifications were subsequently made with Claude Code so as not to pay for a Devin subscription.

An agent who keeps his promises

At the end of this test, Devin stands out as a credible, and even formidable, alternative to the code agents that we have been able to evaluate in recent months. Because if the alternatives to Claude Code, Codex or Gemini each have their merits, it is clear that few of them have delivered results as convincing as the stars of the market in real conditions. Devin’s real differentiator comes down to one word: autonomy. Devin takes the logic further thanks to its dedicated virtual machine. The experience is more fluid, less verbose, and the result reflects it. The code delivered is almost production-ready, in particular because the agent spontaneously connects development, tests and visual verification, a reflex that only the most seasoned Claude Code users think of requiring from their agent.

All this has a price, and it is far from negligible. With an effective hourly cost of between $8 and $9, Devin is significantly more expensive than its direct competitors. There is, however, an indirect advantage to this pricing transparency: Devin’s price closely reflects the real cost of generative AI when it is used intensively on complex tasks. Companies that adopt this tool today are acclimatizing to a level of invoicing that will, sooner or later, be that of the entire market. When the time comes to pay fair value for AI, the transition will be much less brutal.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment