Token Monster, the service that selects the best AI with each request

Token Monster, the service that selects the best AI with each request

Launched in May 2025 by the Hyperwrite AI CEO, Token Monster promises to optimize the intelligence of the generative AI using models adapted to each use case, automatically.

What if it became easier to choose the right model adapted to the right use case? This is the ambition of Token Monster, a side project launched by Matt Shumer, the emblematic CE of hyperwrite. The tool is inspired by online command code agents that allow you to choose the best model automatically to perform development tasks. “We wanted to have a tool, which would facilitate the choice of the most relevant model for each use, including out of development. Many users are ready to invest more to access the best results,” explains Matt Shumer.

How does Token Monster work?

Reasoning, code generation, text generation … Each LLM available on the market is today excellent on one or more verticals. To obtain the best results, it is therefore necessary to combine several models. This is the very principle of Token Monster. The tool is in the form of a classic chatbot and combines 7 models. GPT-4.1 For the code, Claude Sonnet 4, Opus 4 and Sonnet 3.5 for the generation of text, code and creative requests, Gemini 2.5 PRO for reasoning and generation of code and finally Sonar Deep Research and Sonar of Perplexity for web research and advanced research.

When the user launches his request, an orchestrator model (opus or sonnet depending on the desired configuration) generates an action plan before starting the answer. One or more models are called according to the desired task. However, the most appropriate model for selecting the model is not public, too bad. The whole is perfectly transparent for the end user. Token Monster also has many connectors so that AI can question and act on the real world. We thus find Gmail, Slack, Github, notion, a large part of the Google suite (Drive, Sheets, Calendar, Docs) or Zendesk. A single request can trigger a sequence of actions, such as a research phase, followed by an analysis, a drafting and a refinement, potentially using different LLMs at each stage.

Token Monster allows you to configure the default cat mode: “Smart” for more precise and “fast” answers to optimize latency. The interface also allows you to choose your forkly, Claude 4 opus model for complex prompts or requiring a large number of steps and Claude Sonnet 4 for faster responses. Also note that using OPUS in orchestrator will cost you more, we will come back to it.

The black point of token monster

Token Monster is particularly useful on complex use cases or requiring several stages to achieve the final result. The writing of relations with recent and/or sourced information, the generation of code for a project with a high level of complexity is two relevant examples of use of Token Monster.

For example, we asked Token Monster to generate the code of a somewhat complex widget (traffic forecast between a point A and a point B). The AI ​​then generated an actions plan to be undertaken to achieve the result. Token Monster then delegates to Sonar de Perplexity the search for information on the APIs to be used, the architecture of the widget is then delegated to Gemini 2.5 Pro and the code of each file is generated by Claude 4 Sonnet. Token Monster then gives us the code and a small documentation to use the widget. Three models will have been used.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment