The AIs take action: vision, memory, autonomous agents … Everything gets carried away.
This week, several announcements have strengthened a clear trend: IA agents become more autonomous, capable of interacting with their environment, managing contextual memory and integrating into existing interfaces.
Microsoft: Copilot interacts with graphics interfaces
Microsoft has extended Copilot capabilities with two new features. The first, integrated into Copilot Studio, allows AI to interact directly with graphic interfaces on Desktop and Web. Thanks to the “Computer USE” function, it can perform actions such as click, fulfill forms or navigate in menus, without requiring API.
The second, Copilot Vision, integrated into the Edge browser, allows the assistant to analyze the visual content of the screen in real time. AI can answer contextual questions or suggest relevant actions, without interacting directly with the elements displayed. These advances target the uses of productivity and contextual assistance, without depending on specific integrations.
Canva: Unified platform for content, code and data
Canva launched Visual Suite 2.0, an update of its content creation platform assisted by IA. The Canva AI assistant allows you to generate texts, images, presentations and videos from simple instructions. Canva Code facilitates the creation of widgets or websites without coding, and Canva Sheets transforms the calculation sheets into interactive dashboards, with integrated analysis functions. The set is grouped in One Design, a unified collaborative creation interface.
XAI: Grok studio and user memory
XAI presented Grok Studio, a collaborative interface in shared screen to code, write documents or design games with the Grok. The tool supports several languages and offers real -time preview.
The agent now has a customizable memory. The user can consult, modify or delete the registered information. This system allows AI to adapt its responses to the history of exchanges.
OPENAI: new GPT-4.1 version and multimodal tools
The GPT-4.1 version is now available via the API. The model improves reasoning, monitoring of instructions and code generation, while supporting longer contexts.
Two other variants, O3 and O4-Mini, are offered in the Chatgpt interface. The first favors the depth of reasoning, the second the speed of execution. Both are multimodal, capable of interpreting text and images, and can trigger actions through tools such as code editor, navigation or file management.
A new agent, Codex Cli, was also presented. Accessible online command, it can read, modify and execute code locally. It is based on O4-Mini by default and accepts visual entries such as screenshots.
Bytedance: Optimized video generation with Seaweed-7b
Bytedance unveiled Seaweed-7b, a 7 billion parameter video model. It generates 720p videos at 24 FPS in real time with only 40 GB of VRAM. It supports various tasks, including the generation controlled by camera trajectory.
Its architecture is based on a VAE with 64 × compression and a hybrid flow transform, reducing the 20 %calculation needs. Gradual training combining supervision and strengthening improves the quality and consistency of videos.
Anthropic: Claude becomes an assisted research tool
Anthropic added two features to Claude. The first, Research, allows AI to conduct multi-stage research on the web and restore structured responses with sources.
The second adds integration to Google Workspace. Claude can access Google Docs, Gmail and Google Calendar to extract, summarize or cross information, under user control.




