Google Built the Agents. The Web Is Building the Protocol. // Herbert Cuba Garcia

900 million Gemini users. 50 billion images generated. A personal agent — Gemini Spark — that keeps working even when your phone is off. Google I/O 2026 wasn't a product announcement. It was a demonstration of infrastructure at planetary scale.

The numbers are almost impossible to contextualize. Gemini 3.5 Flash scoring 83.6% on the MCP Atlas benchmark. Managed Agents that provision a Linux sandbox with web browsing from a single API call. Information Agents running 24/7 in the background, monitoring the web on your behalf. Antigravity 2.0 letting developers orchestrate subagents without writing the scaffolding themselves. Universal Commerce Protocol enabling AI agents to complete purchases end to end. Google didn't just ship features — they shipped an entire agentic operating model.

The dominant read on all this is straightforward: Google won the agent race. Everyone else is catching up. Agent-first is the new mobile-first, and Google just proved they have the distribution, the data, and the infrastructure to own that transition.

I think that read is right, and it misses the more important story entirely.

The agents themselves are one layer. The more consequential layer is the interface — the set of rules, APIs, and contracts that govern how agents talk to the rest of the web. Google has a very clear answer to that question, and it's built into the architecture of everything they announced.

When Gemini Spark monitors the web for you, it does so through Google's infrastructure. When an agent completes a purchase via Universal Commerce Protocol, it's Google's protocol. When developers build on Antigravity 2.0, they're building on Google's platform. The web, in this model, doesn't disappear. But it becomes a resource that Google's agents consume on everyone else's behalf. The interface between AI and the web is Google.

That's not a criticism of the engineering. The engineering is genuinely impressive. It's an observation about structural incentives. When you build an agentic layer on top of the web, and you also own the largest advertising business in the world, you are not neutral infrastructure. You are a routing layer with strong opinions about which signals flow through you and why.

The same week as I/O, something quieter was gaining traction. A W3C proposal called WebMCP — filed under the Web Machine Learning group — had crossed 2,400 GitHub stars, 149 forks, and 86 open issues. The proposal is authored by engineers from Microsoft and Google together: Brandon Walderman, Leo Lee, and Andrew Nolan from Microsoft; David Bokan, Khushal Sagar, and Hannah Van Opstal from Google. A Chrome flag already exists. The spec has been public since August 2025.

WebMCP proposes a browser-native API — navigator.modelContext — that lets any website declare structured tools that AI agents can call directly. No scraping. No guessing from the DOM. The site says what it can do, the agent calls it, the browser mediates the permission. Two implementation paths: imperative via JavaScript, or declarative using HTML form attributes. Either way, the web speaks to agents in structured terms, not in raw HTML.

This is a technical shift with structural consequences. Today's agents — including Google's — interact with web pages largely by scraping DOM elements. To book an appointment on a typical website, an agent might traverse 40 or more DOM nodes, inferring form fields, guessing input formats, hoping the markup doesn't change between calls. The brittleness is real. So is the security exposure: malicious page content can inject instructions into an agent's context just by being present on the page.

With WebMCP, the site registers a function: bookSlot(date, time, name, email). Typed parameters. An explicit contract. Permission granted by the browser, not inferred by the model. The agent doesn't need to understand the visual layout of a page. It just calls the tool.

There's a scene in Ready Player One where we learn what IOI — Innovative Online Industries — actually wants. Not to win the OASIS for the game. To own the infrastructure and monetize access to it. In Nolan Sorrento's vision, 80% of your visual field in the OASIS would be advertising. The OASIS exists to route value through IOI.

James Halliday built the OASIS as open infrastructure. His egg — the Easter egg hidden in the world he created — was a bet that the right people would want to keep it that way. The conflict in the film isn't really about a video game. It's about who controls the layer between people and the world they inhabit.

Google isn't IOI. The comparison isn't about intent. But structurally, Google's agentic vision looks a lot like a platform that wants to be the mediating layer between humans and the internet — not because they're malicious, but because that's what their business model rewards. When the agent is Google's, and the protocol is Google's, and the checkout flow is Google's, the web becomes an input to Google's stack rather than independent infrastructure.

WebMCP is closer to Halliday's egg. It's a bet that the interface between agents and the web should be an open standard, not a product feature owned by whoever has the most users today.

The historical analogy worth holding here is HTTP. HTTP didn't win because one company backed it. It won because it was vendor-neutral. Any server could speak it. Any client could read it. The protocol didn't care who built the product on top of it, which meant the products could compete on their own merits without fighting for control of the pipe.

Anthropic's Model Context Protocol works at the backend — server-side tool definitions that agents can call in any environment. WebMCP works at the client side — browser-native, visible to any agent operating in a browser context. They're not competing with each other. They're complementary layers of the same emerging stack. MCP for the server layer, WebMCP for the browser layer. Together, they start to look like what HTTP did for documents — except for agent interaction.

The alternative is fragmentation. Every major platform builds its own proprietary agent-to-web bridge. Google's agents talk to Google's tools. Microsoft's agents talk to Microsoft's tools. OpenAI's agents navigate a web optimized for OpenAI's scraping logic. The walled garden era returns, except the walls aren't visible anymore. They're encoded in which tools your agent can discover and call.

As a Tech Director within AI, my take is straightforward: the Agent Race is going to compress fast. Google, Anthropic, OpenAI — they will all have capable, general-purpose agents within the next 18 months. The capability gap will close. The model benchmarks will converge. Gemini Spark is impressive today; the field catches up quickly.

The Protocol Layer won't converge that way. Protocols don't get superseded by the next model release. They calcify. They become infrastructure. Once enough of the web is built around a particular interface standard — whether open or proprietary — switching costs become enormous. This is not a slow-moving question. The decisions being made right now about how agents discover and invoke web services will shape the structure of the internet for the next decade.

I think WebMCP should succeed. Not because it's technically perfect — the 86 open issues suggest there's still significant work ahead — but because vendor-neutral infrastructure beats proprietary infrastructure in the long run, almost every time. The web's openness is what made it worth building on. Agent access to the web should be subject to the same principle.

What I'm genuinely uncertain about is timing. Open standards move at W3C speed, and Google's shipping velocity right now is extraordinary. By the time WebMCP reaches widespread adoption, how much of the web's agent interaction layer will already be locked into proprietary patterns?

The engineers writing the WebMCP spec include people from inside Google. That's interesting. It suggests the question of open vs. proprietary isn't cleanly divided along company lines. There are people inside every major tech organization who understand that their long-term interests are better served by open infrastructure than by fighting over protocol ownership.

The question is whether those voices shape the outcome, or whether product timelines do.

Google Built the Agents. The Web Is Building the Protocol._

//Read next