Vision Changes Everything
Claude Computer Use - The Future of App Testing (among a few million other things)
Tesla’s FSD took a sharp turn in 2022 when Elon eliminated all sensors except cameras. His rationale was simple:
The world’s roads were designed solely for vision, as should autonomous vehicles.
Anthropic’s newly minted Computer Use AI is fundamentally the equivalent of autonomous agents who must work in a digital world designed for human vision.
Vision changes EVERYTHING.
While this new release and approach to agentic solutions from Anthropic include profound capabilities that impact millions of personal and business use cases, ONE USE CASE that is undeniable is how this changes everything about app testing.
To make these general skills possible, we've built an API that allows Claude to perceive and interact with computer interfaces. Developers can integrate this API to enable Claude to translate instructions (e.g., “use data from my computer and online to fill out this form”) into computer commands (e.g. check a spreadsheet; move the cursor to open a web browser; navigate to the relevant web pages; fill out a form with the data from those pages; and so on).
Using this new API, it is possible to unleash an agent on a web app that could follow fully synthetic and fully automated scripted or random web tasks. Claude’s API is also capable of capturing the activities as a video with all the output and the experience so that it can be used to generate future testing and synthetic data assets.
The upgraded Claude 3.5 Sonnet is now available for all users. Today, developers can build with the computer using beta on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The new Claude 3.5 Haiku will be released later this month.
Let's just say Vision AND Voice combined changes everything. In your previous column, you highlighted the importance of real-time voice agents. After owning an electric car now for two years and being forced to dangerously fiddle with screen displays while driving just to accomplish seemingly simple tasks needed to control basic car functions, I'm convinced the only logical and safe way forward is to reduce "cockpit" screen space and replace it with highly intelligent voice controls. In fact, I'm amazed this transition hasn't happened sooner. Individual car manufacturers need to put their heads together on this in order to implement a basic and standard set of voice-activated car controls used in everyday driving scenarios. I want to navigate my climate controls, lights, washers, radio and even seat settings with my voice and I want intelligent confirmed feedback (combined with display) after each command. Is that too much to ask? Which auto manufacturer will get there first?