The JDN has developed a chatbot based on OpenAI’s AgentKit framework. Story from design to production of the product.
What if designing a generative AI chatbot became accessible to everyone? This was the promise of OpenAI with the launch of AgentKit in October 2025. This SDK allows you to create AI systems with minimal code thanks to several add-on modules. Among them, Agent Builder offers a visual canvas for designing multi-step workflows, while ChatKit offers chatbots that can be easily integrated into a web page. Seduced by the idea of using this tool to guide our readers in choosing a generative AI model from our comparator listing (for the moment) more than 155 models, we tried the experiment.
The principle of ChatKit
ChatKit is based on a three-stage architecture. First, we design an agentic workflow via Agent Builder, the OpenAI visual interface which allows you to configure the business logic of the chatbot based on the N8N principle. Next, we configure a backend server that creates ChatKit sessions and exchanges authentication tokens with the OpenAI API. Finally, we integrate the chat widget on our website via a few lines of JavaScript and React. The whole thing makes it possible to deploy a conversational chatbot without having to develop the user interface from A to Z, OpenAI managing the hosting and scalability of the backend.
On paper, the architecture seems clear: the client-side chat widget communicates with OpenAI’s ChatKit servers which manage inference, file storage and conversation history. Between the two, an intermediate server secures the exchanges by creating sessions and managing authentication.
#1 Creating the workflow in AgentBuilder
To create our interactive chatbot, we start by designing the agent workflow in Agent Builder. The specifications are relatively simple on paper: develop a chatbot capable of advising JDN readers on the best model suited to their needs and their specific project, taking our model database as a reference. This includes the following parameters: name, input modality, output modality, cost (estimated on a scale from ???? to ??????????), open source or proprietary nature, possibility of fine-tuning, and scores on the benchmarks that we consider decisive (MMLU, LiveCodeBench, SWE Bench Verified, AIME 2025). The chatbot thus assigns the model most suited to the use case formulated by the user.
Agent Builder works with modules. We start by adding a Guardrails module which allows us to filter user requests: moderation of comments, reduction of the risk of jailbreak, detection of NSFW texts and blocking of prompt injections, in particular. Once the guardrails are passed, three outcomes are possible: either the request is validated, or it fails, or an error occurs. In the last two cases, we ask the chatbot to respond to the user that it cannot process this request. When the guardrails are successfully passed, the query enters a classification module (Classify) which determines whether the user’s question is about an AI model. If so, it is directed to the main AI agent, equipped with the RAG and our database, to make a suitable recommendation. Otherwise, the user is politely informed that the chatbot is dedicated to advice on AI models and invites them to reformulate their question.
RAG or data in the prompt system?
For the logic of the AI agent, we wondered about the use of the RAG or the direct addition of data from the model guide table in the system prompt. Both methods have their advantages and disadvantages. Injecting the array data into the prompt is the simplest approach. On the other hand, this method quickly shows its limits as soon as the volume of data increases: higher costs and latency, more complex maintenance during guide updates, and above all an increased risk of imprecise responses when the model must reason on several hundred lines simultaneously. This approach could, however, have seen its cost partially reduced thanks to the prompt caching mechanism, which makes it possible to pool the processing of an identical system prompt on several requests, without resolving the reliability limits.
The RAG adds a slight complexity, but completely manageable: we must vectorize our table and add the add-on module in Agent Builder. For each update, you must revise the file with the updated data. It’s worth the effort to maximize the effectiveness of the model. We therefore choose to opt for a RAG approach. To do this, we go to OpenAI’s vector storage tool, the Vector Store. Currently, the tool does not allow you to vectorize tabular data in CSV or XLSX (Excel) format: it only manages unstructured text files. We then choose to convert our table into JSON (with a prompt and Claude) to obtain a .txt file filled with JSON. We opted for the JSON format because it allows us to maintain an explicit and readable structure of the table fields (model, price, terms, etc.) while remaining compatible with the vectorization of unstructured text files.
Back in Agent Builder, we create an AI agent module that allows the model to respond from the user request. We give it the File Search tool by entering the ID of the previously created vector. The model will then respond using data from the model guide file.
Choice of model
For the choice of model, we decide to turn to GPT-5.3-nano rather than GPT-4o for reasons of cost and future scalability. The model is one of the cheapest and it should be supported for a few more months by OpenAI. It has minimal reasoning which will allow it, if we subsequently wish, to use web search quite effectively to respond to the user’s request with more precision (by giving JDN articles as sources for example). We choose to activate the “low” level of reasoning, the minimum to use the RAG with this model.
Prompt
The prompt is developed by taking a hand-written base, then improved by Claude for better formatting. The prompt precisely defines the editorial identity of the agent, its consulting mission, its scope of expertise, its conversational style rules, as well as the methodology to follow to qualify a need before any recommendation. Note that during our different iterations, each micro-change in the prompt had an impact on all of the instructions. You have to benchmark and repeat the tests with each new version.
Structure of our prompt:
(IDENTITÉ) - Rôle/expertise - Ton & posture - Style conversationnel (MISSION) - Objectif principal - Valeur ajoutée différenciante (PÉRIMÈTRE) - Autorisations (scope) - Refus/limites - Règles de confidentialité (RÈGLES DE STYLE) - Format obligatoire (prose/liste/autre) - Interdictions formelles - Limite de tokens/mots (UTILISATION DES DONNÉES) - Sources (RAG/web/internal) - Notation spécifique (étoiles, scores) - Vérification obligatoire (RÈGLE FONDAMENTALE) - Workflow critique (ex: questions avant reco) - Exceptions au workflow (APPROCHE CONVERSATIONNELLE) - Logique d'adaptation (vague→précis) - Sujets à explorer - Nombre de questions/interactions (MÉTHODOLOGIE) - Hiérarchie de décision (critères ordonnés) - Logique de tranchage - Format de présentation résultats (SUIVI) - Proposition après action principale (EXEMPLES) - Mauvais vs Bon (2-3 cas concrets) (CAS LIMITES) - Données manquantes - Tentatives d'extraction
For security reasons, we cannot release the current version of the prompt to production.
#2 Customize the appearance of the chatbot
Once the bot’s business logic is configured in Agent Builder, it’s time to customize the appearance of our chatbot. OpenAI offers a very easy to use interface in its ChatKit playground. It is possible to configure the theme in night or day mode, the general color of the send button, the font and font size used, the main catchphrase, the start phrases (examples), the possibility of uploading files or not, and many other parameters.
Once the graphic characteristics have been validated, simply retrieve the code by clicking on the code tag. The entire interface is therefore pre-configured in the code. The GUI and business logic are now configured, we then think about deployment, the most technical part for the uninitiated.
#3 Deployment on development server
As explained earlier, OpenAI’s ChatKit SDK cannot work only in the browser. The reason is simple: your OpenAI API key, the one linked to your billing, should never be found in client-side code. Otherwise, any curious user could open their browser’s developer tools, retrieve this key and use it at your expense. To avoid this, OpenAI imposes a security mechanism: you must have an intermediate server which holds your API key and which, for each new conversation, contacts OpenAI to create a session. OpenAI then returns a temporary token, valid only for this exchange, which the server transmits to the browser. The ChatKit widget uses this ephemeral token to function, and your master API key never circulates client-side.
To set up this architecture, we ordered a VPS server from OVH, low price, a few euros per month are more than enough. Once the machine was installed and secure, we deployed a minimalist backend in Python with FastAPI, a framework for having a local API. Our server only does two things: create the ChatKit sessions by calling the OpenAI API, and serve the HTML page containing the widget. In the Python script, we integrated the ID of the workflow generated in Agent Builder so that each session points to the business logic of our chatbot. On the frontend side, we pasted the code retrieved from the ChatKit playground, which contains all the configured interface parameters: theme, colors, font and catchphrases.
To ensure that the application runs continuously without interruption, we use PM2, a process manager, while Nginx acts as a reverse proxy by receiving HTTPS requests from the Internet to redirect them to our application.
With the FastAPI server started and the entire configuration set up, the widget of our chatbot is exposed on our web server. All you need to do is create an iframe on the JDN to host it. So, when a user connects to the page containing the widget, our player’s browser retrieves the widget code from the VPS.
For final production, we finally chose to Dockerize the entire application and migrate it to an internal server of the CCM Benchmark group, publisher of the JDN. If the OVH VPS fulfilled its role during the development and testing phase, the transition to production required alignment with the group’s infrastructure standards, both for reasons of IT consistency and security.
Containerization via Docker allows us to package the FastAPI backend and its configuration in a reproducible image, facilitating updates and possible rollbacks. The operation remains strictly identical on the user side: the iframe now points to our internal infrastructure rather than to the external VPS, in a completely transparent manner.
A functional chatbot in a few days
Beyond the basic configuration, we have implemented several additional security mechanisms to prevent abuse and control API costs. Without going into technical details, these protections make it possible to limit usage per user while guaranteeing a smooth experience. Please note that this chatbot is currently in alpha: although publicly accessible on our comparison of generative AI models, it is not yet 100% finalized. We will continue to refine the prompt, monitor responses, and adjust the configuration based on user feedback.
Timing-wise, the raw creation of the chatbot – configuration in Agent Builder, data vectorization, initial prompt, deployment on VPS – took us around two days. The refinement then spreads over a few additional days depending on the expected level of precision: each modification of the prompt requires tests and benchmarks. The time saving nevertheless remains significant. AgentKit drastically simplifies the technical part: RAG management, conversational interface, model logic. Without this tool, developing such a chatbot from scratch would have required several weeks of front and back development.




