Discussion about this post

User's avatar
Bill french's avatar

I’ll run another stress test by removing all specificity of the instructions and simply state the goal. I believe it will work. This is the promise of agents that can reason when the goal is simply “Update the temp in this app.”.

It shouldn’t matter where the controls are or what the details of the process is. A reasonably competent agent should be able to figure it out.

Neural Foundry's avatar

Impressive breakdown on vision vs API rigidity. The Tesla analogy actually clarifies something I've struggled to explain to clients who keep asking why their automations break every time a vendor updates their UI. The stress test where you renamed elements and the agent still figured it out is the killer demo. I've been building similar workflows lately but hadn't considred the "hot swap" session reuse angle, that probabyl saves a ton of context initialization overhead.

No posts

Ready for more?