TheDownload.AI
Posts
88th Edition Download

88th Edition Download

Google’s Gemini model learns to interact with browsers; Sam Altman predicts agentic work weeks and prompt-run startups; and xAI debuts Imagine v0.9—video generation with audio plus voice commands

The Download
October 8th, 2025

In partnership with

Don’t get SaaD. Get Rippling.

Remember when software made business simpler?

Today, the average company runs 100+ apps—each with its own logins, data, and headaches. HR can’t find employee info. IT fights security blind spots. Finance reconciles numbers instead of planning growth.

Our State of Software Sprawl report reveals the true cost of “Software as a Disservice” (SaaD)—and how much time, money, and sanity it’s draining from your teams.

Read the full report and see how your company stacks up→

The future of work is unified. Don’t get SaaD. Get Rippling.

Stop SaaD in its tracks

This Week in AI:

No jargon, no filler—just the biggest AI developments worth knowing right now. Perfect for quick industry insights, so you can skip the buzzwords and get straight to the good stuff. Let’s dive into this week’s AI shake-ups, just as promised:

Google DeepMind just dropped a model that actually “uses” a computer via browser actions, opening possibilities for more seamless AI agents across software. Meanwhile, Sam Altman opened up about what’s next at OpenAI: agentic work, novel discovery, and even zero-person startups built by prompts. And xAI isn’t staying quiet either—its new Imagine v0.9 tool can generate videos (audio included) and is going voice-first, letting you tell it what to create hands-free.

Let’s get into it.

In This Issue:

Gemini Learns to Use Computers → DeepMind’s “computer use” model navigates browsers, filling forms, dragging elements, and more.
Altman’s Vision from Dev Day 2025 → From agentic work weeks to “zero-person” startups, here’s what OpenAI’s CEO sees ahead.
xAI Reveals Imagine v0.9 → Grok’s video tool now produces sound + motion, plus a voice-first interface “Open in Voice Mode.”

🟣 Gemini Learns to Use Computers

TL;DR:

Google’s Gemini 2.5 Computer Use model is designed to interact with UI elements in browsers—typing, dragging, submitting forms, etc. It supports about 13 actions and is built to help with interfaces that lack APIs. It’s previewed via Google AI Studio / Vertex AI and showcased in demos like “play 2048” or “browse Hacker News.”

Our Take:

Until now, agentic models mostly acted via APIs or backend hooks. Gemini’s computer-use mode brings AI directly into the user interface layer. That opens paths for automation on legacy systems, UI testing, hybrid tools, or agents acting where APIs don’t reach. For product teams, it’s a signal: build with UI affordances in mind; expect more agents to “control” frontends soon.

🟣 Altman’s Vision from Dev Day 2025

TL;DR:

In his Dev Day interview with Rowan Cheung, Sam Altman said AI is entering an era of “novel discovery” (scientists are already using it for breakthroughs). He predicted work might “look less like work” as agentic systems take over time-based tasks. He’s optimistic about zero-person startups—fully prompt-driven ventures. And he thinks Codex isn’t far from autonomously delivering weeks of work.

Our Take:

Two things stand out: (1) Altman doubling down on agentic time, not just tool assistance, means the next frontier is AI that does, not just suggests. (2) The idea of zero-person startups is bold—and scary: it assumes models can bootstrap companies autonomously. For builders, the key is aligning models to economic incentives. If AI can genuinely generate value, then who “owns” it (you, prompt engineers, the model provider) becomes the next battleground.

Introducing Imagine v0.9, our new video generation model with massive upgrades from v0.1 in visual quality, motion, audio generation, and more.
Now available for free on all our products: grok.com/imagine
— xAI (@xai)
5:03 PM • Oct 7, 2025

🟣 xAI Reveals Imagine v0.9

TL;DR:

xAI’s new Imagine v0.9 (built on Grok’s Aurora engine) can create videos with audio, and image + motion, in about 15–20 seconds. New voice-first interface mode also lets users “Open App in Voice Mode” to generate content hands-free.

Our Take:

Video + sound + motion is the next step after still image generation—but doing it fast and well is hard. xAI’s push into voice-first control is an interesting UX experiment: imagine telling your phone “make me a 10-second clip of this idea” and getting it instantly. The challenge will be managing coherence, style consistency, and user expectations. If voice becomes a dominant content interface, tools built for typed prompts may feel clunky fast.

🚀 Thank you for reading The Download

Your trusted source for the latest AI developments to keep you in the loop, but never overwhelmed. 🙂

*Want to get in front of 600k+ readers? Email [email protected]

Reply

or to participate.