Jeff Boudier (Hugging Face) “Robotics is now an integral part of Hugging Face’s strategy”

Jeff Boudier (Hugging Face) “Robotics is now an integral part of Hugging Face’s strategy”

Jeff Boudier, chief product and growth officer of Hugging Face, details for JDN, during re:Invent, how the French unicorn is extending its influence in AI while now accelerating in robotics.

JDN. Hugging Face is the champion of open source, but you’re a $4.5 billion company. Today, what is the real driver of your income?

Jeff Boudier. Today we have more than 12 million AI builders: researchers, data scientists, machine learning engineers and, increasingly, software developers. They use the platform daily to access, operate and share their own models, datasets and applications. This access to the Hugging Face Hub is essentially free. However, for individuals and businesses with more specific needs, we offer paid services. If you’re one of our 12 million users, you can sign up for our Pro plan for around $9 per month. This gives you access to benefits like Hugging Face Inference Providers to consume models, or to Spaces, our AI application platform, with premium computing.

But the newest and fastest growing element is our offering for organizations. More than 300,000 organizations have organically created on the platform to collaborate and build privately. We often talk about the 2 million open, public and freely available models, but we host an equivalent volume of private models, datasets and applications. In total, around 6 million repositories are currently hosted on Hugging Face.

You launched LeRobot, your foray into open source robotics. Why this bet? Do you think the next wave of AI will take place in the physical world?

It’s probably been a little over a year since we identified an incredible opportunity in robotics, a sector still largely dominated by rule-based automation code. We found that machine learning models applied to robot control were not only starting to produce excellent results, but also had characteristics similar to those that made Transformers successful in NLP.

“LeRobot was created to replicate what we did for text seven years ago: simplify the transition from research to practice”

The idea is to use a pre-trained model, which can be adapted efficiently. Whether it’s an arm with six degrees of freedom or a mobile base, the mission could involve, for example, grabbing a phone and turning it over. The environment, the robot or the task may vary, but starting from the same model, it is possible to achieve a good success rate after only around fifty tries.

LeRobot was created to replicate what we did for the text seven years ago: simplify the move from research to practice. Rather than having to recode the implementation of a scientific publication, LeRobot allows these models to be directly applied to AI Robotics. You start from an existing base, you refine it via teleoperation, and you quickly obtain a solution adapted to your environment.

How many employees are currently working on LeRobot?

Around 20% of our teams work on projects related to robotics. This covers both the software part, around LeRobot, and the hardware. We started by providing open source hardware to the community. By this, we mean reference plans and blueprints to print the parts at home, a BOM list to purchase items that you cannot make yourself, as well as assembly instructions.

The SO101 arm is, in my opinion, the most successful open source robotics platform to date. Whether in number of units manufactured or in quantity of data generated, this success was only possible thanks to a completely open source design, adopted and improved by the community. For those who do not want to print the parts, assemble the robot or manage the software part, we have started selling ready-to-use robots. The best example of this is the Reachy Mini.

This year, Hugging Face acquired Pollen Robotics, the manufacturer of the Reachy 2. Starting with the Reachy 2, which costs around $70,000, we designed a general public version: the Reachy Mini, a robot available for around $300. Simply plug it into a computer to run models from the Hugging Face Hub. It has hearing, a speaker, a camera, a head with six degrees of freedom and expressive antennas. It’s the perfect companion or educational tool for experimenting with AI without having to build an entire robot. The Reachy Mini has been surprisingly popular: since its launch, we have recorded more than 5,000 orders. The Pollen team is working hard to ship everything before Christmas. This is a really exciting development.

Has robotics become a new line of income for you?

Yes, it already does. This is clearly integrated into our current strategy.

You are increasing partnerships with model publishers (Meta, Mistral, Google, Nvidia, etc.). Concretely, what do these alliances bring to Hugging Face?

We distinguish several types of collaborations. As Hugging Face is the reference platform for publishing models, we strive to simplify this process for all model providers, whether it is Alibaba with Qwen, Meta with LLaMA and SAM, or Amazon with Chronos models.

“A new repository is created every ten seconds on Hugging Face”

We also have important partnerships, like the one with AWS for five years, to help their customers use our tools easily. Concretely, we provide ready-to-use versions of our technologies, we integrate Hugging Face into their main services and we collaborate with their teams to make our models work best on their machines. All of these activities aim to increase adoption of our tools by AWS customers. Today, this represents tens of petabytes of models, datasets and applications served, or billions of queries per month. It’s huge.

You now host thousands of open source models, some of which are very sensitive. How do you ensure the security of all the models on the platform?

A new repository is created every ten seconds on Hugging Face. The basic contract is as follows: Hugging Face secures public repositories, while the company is responsible for securing its own use. For our part, we run several checks for each published model: malware scanner, personal data detection and Pickle file inspection. Pickle is a model weight serialization format, widely used with PyTorch, but which presents a risk of arbitrary code execution. This is why we are developing Safetensors as a secure alternative, which we are trying to impose as a standard.

We also collaborate with AI security specialists like Protect AI, JFrog or VirusTotal, whose analysis results are visible directly in the ‘Files’ tab of the models.

We see, in both open source and proprietary AI, a trend towards the release of SLM (small language models). Is this the end of the race to gigantism? Do businesses really need 100 billion parameter models for their daily tasks?

I wouldn’t talk about a new paradigm that would replace everything else. However, many use cases today relying on massive, expensive models would work just as well, or even better, with smaller, specialized models. This is the trend for the next two or three years.

Hugging Face’s vision is not of a single model solving all of humanity’s problems, but rather of millions of models designed to perform a specific task, more reliably and economically. We want every company to be able to build their own tools. It is not necessarily a question of creating a new frontier model, but of starting from an existing model adapted to the problem and simply customizing it.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment