Google may launch “Project Jarvis,” its Rabbit-inspired model, in December, enabling web task automation in Chrome.
The company is set to preview the computer-using agent alongside its primary Gemini large language model (LLM) launch, The Information reports.
Google is reportedly making an AI agent called Project Jarvis that can use Chrome browser to do basic tasks like shopping and booking flights, with its first test planned for December 2024 with the new Gemini model
– Project Jarvis takes pictures of the screen to understand what… pic.twitter.com/kOXOuvYd06
— Tibor Blaho (@btibor91) October 26, 2024
“Project Jarvis,” named in reference to J.A.R.V.I.S. from Iron Man, would operate exclusively with a web browser, mainly Chrome. Sources state that the tool could help users automate everyday web tasks, including taking and analyzing screenshots, pressing buttons, entering text, scheduling flights, handling research, and online shopping. The article does not specify if this will be for mobile or desktop.
The report indicates that Jarvis takes “a few seconds” to perform actions, indicating it likely relies on the cloud instead of functioning on-device.
Google is reportedly considering a limited release to testers to uncover and fix bugs. The Information warns that the company’s plan to showcase Jarvis in December may change.
AI Companies Push Boundaries with LAMs
A LAM is an AI system that translates human intentions into actions, enabling tasks like booking rooms and making complex decisions. LAMs learn from extensive user action datasets for strategic planning and real-time responses.
Leading AI companies are creating LAMs similar to the one described in the report about Google. For instance, Anthropic recently unveiled AI agents that autonomously perform complex tasks on computers through its chatbot, Claude. Claude processes on-screen data and acts on users’ behalf with their consent. OpenAI is also reportedly working on a comparable version.
Microsoft’s Copilot Vision will enable users to interact with it regarding the web pages they view. Apple Intelligence is also expected to understand screen content and perform tasks across multiple apps next year.