AI automationJune 10, 20268 min read

Driving a Browser Fleet with MCP Agents

AI agents are getting good at operating software, and the browser is the surface where that matters most, because the browser is where the work actually lives: dashboards, listings, campaign managers, storefronts. The missing piece has been giving an agent a browser worth operating, one with real sessions, a real identity, and the ability to scale past a single window.

The Model Context Protocol, MCP, is how agents and tools now connect, and it changes what is practical. Instead of writing automation scripts that an agent generates and you babysit, the agent calls browser tools directly: create a profile, launch it, navigate, read the page, act, and fan the same operation out across a fleet.

This post covers what MCP is, why agents do better work in real browsers than in script-flavored automation environments, what an agent can do over Oculr's MCP surface, and the workflow pattern that makes agent browsing economical: figure it out once with the agent, then replay without it.

What is MCP?

The Model Context Protocol is an open standard, introduced by Anthropic in late 2024, for connecting AI models to tools and data. The shape is simple: an MCP server exposes a set of typed tools, each with a name, a description, and a schema, and an MCP client, such as Claude Code, the Claude desktop app, or any compatible agent runtime, discovers those tools and lets the model call them mid-conversation. The protocol handles the plumbing; the server decides what capabilities exist.

For browser work, this means the integration problem disappears. A browser that ships an MCP server does not need per-agent SDKs or custom glue code: any MCP-compatible agent can discover tools like navigate, click, and type, and start using them immediately. The agent reasons about goals, the server executes browser actions, and the same surface works whether a person is chatting with the agent or a pipeline is driving it.

Why do agents need real browsers instead of automation environments?

The default way to give an agent a browser is a headless context with an automation framework injected into it. That environment is recognizably not a normal browser: injected frameworks add detectable globals, headless stacks report unusual properties, and many sites respond by serving altered pages, extra friction, or endless verification challenges. For an agent doing research, QA, or verification, that is a correctness problem before it is anything else: the agent ends up reasoning about a version of the page that real users never see.

Agents also need persistence. Useful browser work is mostly logged in: a session, cookies, a stable identity that the site recognizes on the next visit. A fresh headless context per run throws all of that away and forces re-authentication every time. Real browser profiles invert this: each profile keeps its own cookies, storage, and fingerprint, behaves like a normal browser because it is a normal browser, and is still there, intact, when the agent comes back tomorrow.

What can an agent do over Oculr's MCP surface?

Oculr ships an MCP server with 40+ tools, organized in four families, with raw CDP underneath rather than an injected automation framework. One design detail does a lot of work for agent economics: page snapshots are compressed 5 to 10x before they reach the model, so the agent reads a distilled view of page structure and spends its tokens on decisions instead of DOM dumps.

Browser control: navigate, click, type, select, evaluate, screenshots, tab management, dialogs, and a network request log.
Profile lifecycle: create, launch, attach to, inspect, update, stop, and delete profiles, so the agent can provision its own environments.
Fleet control: create, launch, navigate, click, type, evaluate, and screenshot across many profiles at once, with status and stop-all.
Workflow recording: start and stop recording, declare variables, and save the result as a named, replayable workflow.

How do fleet commands work?

Fleet tools take the single-browser verbs and fan them out. One call launches a set of profiles; one call navigates all of them; one call screenshots each and returns the results, while every profile keeps its own proxy, fingerprint, and session state. The agent is not juggling fifty tabs, it is issuing one instruction and reading back a per-profile result, with status and stop-all commands to keep the run under control.

The pattern fits naturally with verification and testing work. Checking a localized landing page across ten country-specific profiles becomes one navigate and one screenshot call instead of ten manual passes. Running the same QA flow across a matrix of OS and locale configurations, or collecting region-by-region views of public pages for research, scales the same way: drive 5, 10, or 50 profiles with fleet commands and let each one represent its own coherent identity.

What is the run-once-then-replay pattern?

Driving a model through every step of a routine task is the expensive way to automate. Each action costs a model round trip, which is fine while the task is being figured out and wasteful once it is understood. The economical pattern: let the agent perform the task interactively once while Oculr's recorder captures it as a workflow. Steps are recorded against what the page means, with elements re-found by role and text on every run, and inputs like credentials become variables rather than baked-in values.

After that, replay does not need the agent at all. The saved workflow runs on a profile with variable values supplied at run time, or across the whole fleet with per-profile variables, with no model in the loop and per-step run status to watch. When the site eventually changes and a replay stops matching, you bring the agent back to re-record that section. Intelligence where the task is being defined, cheap deterministic execution everywhere else.

How do you connect your agent?

Oculr's model is bring your own agent: Claude Code, the Claude desktop app via connector, or any MCP-compatible client connects to the same server. Local stdio transport works with zero setup, and an HTTP transport is available with mandatory bearer-token authentication and localhost-only binding for anything beyond the local process. Existing automation does not get stranded either: launching a profile returns a WebDriver-compatible driver path and the CDP websocket endpoint, so Selenium, Puppeteer, and Playwright scripts attach to Oculr profiles unchanged while you adopt the agent surface at your own pace.

Go deeper03 links

01Oculr's MCP automation surface 02The browser built for AI agents 03Glossary: Model Context Protocol