Docuogami, Jean Paoli’s start-up, launches into Europe

Docuogami, Jean Paoli's start-up, launches into Europe

The XML pioneer start-up will pilot its European subsidiary from Paris. The company specializes in documentary AI.

American editor of a SaaS solution of LLM documentary, Docuogami opens a subsidiary in Europe. Its founder, Jean Paoli, is the co-author of the ancient XML language of Iniria and Microsoft where he contributed to the creation and standardization of the Internet Explorer, before developing the XML and Open XML formats from the Office suite then to found the open source entity of Microsoft which he coordinates until his departure in 2017. From 2018, this Franco-American creates documentary in USA.

Docuogami specializes in the management of long documents. Files that can come from a few tens to a few hundred pages and therefore difficult to unmanage in the state by AID applications or agents. “We have launched a platform based on large language models to convert this type of information into semi-structured XML data”, explains Jean Paoli. “In particular, we find these documents in insurance, the pharmaceutical sector or even manufacturing. These are generally product specifications.” The solution thus makes it possible to automate a number of critical processes: contracts, reports, audits, compliance …

A dozen open source LLM

Google’s first research article on the transformers having been published in 2017, Jean Paoli did not wait for the advent of Chatgpt in November 2022 to seize the subject. The entrepreneur decides to exploit the very first open source LLM launched at the time. “We integrate a dozen Open Source LLM into a solution that is designed to directly target end -of -company users. This technology is the subject of patents filed in connection with the National Science Foundation and NASA”, underlines Jean Paoli. And Gregory Senay, AI Scientist at CMOSOGAMI (and also French), to specify: “Some models are centered on hierarchical representation in XML, others on computer vision. We also use other IA techno centered for example on the identification of semantic similarity.”

Under the hood, the XML engine weaves a semantic graph through all the documents submitted. The data thus converted open the field of possibilities, from simple visualization to a spreadsheet to their operation by an intelligent assistant via their integration into a workflow. Another advantage: each data extracted is connected to its source. This allows great transparency while avoiding hallucinations. In the case of the generative AI, the documentary solution is particularly interesting for achieving the increased generation of recovery or fine tuner of existing language models. “When you make data to an LLM, it is scientifically proven that the process works better if these data is formatted from a Knowledge Graph. We have ourselves published a research article related to Redis on the subject. However, the role of documentary is precisely to convert information in this format,” summarizes Jean Paoli.

“We are thinning our offer according to the customer to be able to isolate this information and secure it”

Docuogami also uses agent technology. Through a brick called Agentic Quality Control, it allows you to analyze the data in terms of quality. Within hundreds of extracted information points, an agent may identify non-conform or false data. “Insofar as the reasoning is made on the basis of the whole of a documentary base, it will be possible to identify the documents and their data points that come out of the general scheme,” comments Gregory Senay. Better yet. The solution can generate summary tables which proves to be incomplete in the absence of information contained in the basic documents. The application will then create the missing information in extenso. In the bank, it may, for example, depreciate rate calculated from a turnover. “More prosaically, it may be a question of identifying that the start date of a document may have been reversed with the end date,” adds Jean Paoli

In the United States, documentary has already penetrated insurance well, with several dozen sales. In parallel, the start-up begins to penetrate the pharmaceutical and manufacturing fields, but also the bank. Starting from a horizontal solution, the company develops a specialized layer to target each of these verticals. “We are also tuning our offer according to the customer to be able to isolate this information and secure it,” adds Jean Paoli.

France: a strategic choice

Starting from this first base, Docuogami therefore announces its launch in Europe via a subsidiary based in France. “Being a pure product of French excellence, I am very attached to France. This partly explains our choice for this country,” argues Jean Paoli. “But this choice is also strategic. France is historically a land of talents in mathematics and IT in particular. It is also an extremely dynamic country in terms of open source. The LLM Open Source mainly come from France.” Initially, Docuogami plans to hire a R&D team in France. This will involve the hiring of a high -level scientist who will be surrounded by thesards.

Documentary does not start from anything. Its R&D so far based in the United States is mostly formed of French researchers. “We have a whole myriad of contacts within the French scientific ecosystem. Our objective is to reply our American model by creating collaborations with laboratories, in particular with Iniria”, underlines Jean Paoli. Second step: the recruitment of a sales and engineer team to support future European customers. Which will intervene at the end of the summer.

For the rest, Jean Paoli is in discussion with investment funds in order to complete an A standard of funds. It will take place after the creation of a SEED series up to 10 million dollars completed in 2020 led by the Signalfire fund with the participation of Bob Muglia, the ex-CEO of Snowflake. An operation supplemented since by two other liftings carrying the total funds raised to $ 11.7 million. In terms of roadmap, Docuogami intends to develop connectors to third -party data sources to facilitate their ingestion. “Likewise on the side of workflow tools. We are integrated for the time being in Zapier and Power Automate. But again, the objective is to expand the spectrum,” says Jean Paoli. “Thanks to our offer, the workflow trigger can be based on any data present in a document. For example in the case of a contract exceeding $ 1 million, the document may be routed to the manager in charge of managing this type of content.” Another possibility: to carry out a document routing to the persons concerned (engineering, legal, HR …). We are waiting to see what the first French references of the start-up will be.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment