The agent skill

A playbook your agent loads once, so it drives Chrome Relay well from the first command.

Chrome Relay is a CLI, so any agent that can run shell commands can use it with zero integration.

The skill is the difference between can use and uses well. It teaches the snapshot/ref loop, which text tool fits which editor, the fallback ladder, and the gotchas, so the agent doesn't rediscover them by failing.

Install

If the CLI is installed, use the version-matched copy:

chrome-relay skills get core

For Claude Code / Codex-style agents with a skills directory:

npx -y skills add kiluazen/kstack@chrome-relay

The -y auto-confirms npx's "install the skills package?" prompt, so an agent running headless doesn't hang waiting for a TTY.

Or fetch it directly. The skill lives on this site as plain markdown:

https://chrome-relay.kushalsm.com/skill.md

Agents reading this page: load either the CLI playbook or that URL before driving Chrome Relay.

What's in it

The spine:

The core loop

See, act, see again:

chrome-relay tabs
chrome-relay navigate "https://kushalsm.com" --new   # background tab
chrome-relay snapshot --tab 1234 -i           # actionable elements get @refs
chrome-relay click @e12                       # refs need no --tab
chrome-relay fill @e14 "hello"
chrome-relay snapshot --tab 1234 -i           # re-look after changes

Ref lifetime rules

Refs survive same-page churn. They die on navigation. stale_ref means re-snapshot and never retry the old ref. click_intercepted means dismiss the cover or scroll, then retry.

The text-tool table

The highest-value part, because text entry is where browser agents quietly fail:

Target Tool
<input> / <textarea> / <select>, React-controlled, shadow DOM fill @ref
contenteditable, Draft.js, Lexical, ProseMirror type. Appends at caret, so clear first
Submit, navigate, shortcuts keys
Combobox / autocomplete type filter, keys ArrowDown, keys Enter

Operational guardrails

Don't echo secrets into shell strings. Screenshot before irreversible actions. A failing click is information, not a stop signal. Plant a capture-phase listener, act, then read the console.

Why a skill and not an MCP server

MCP tool schemas are injected into the agent's context on every turn. For browser-tool servers, that can mean thousands of tokens before any work happens.

A CLI costs zero until invoked, and the skill is loaded once.

Same capability, none of the standing tax. This is a deliberate position, not a missing feature.

Keeping it current

chrome-relay <command> --help is always authoritative. The skill says so itself, and agents should trust the binary over any doc, including this one.

The skill's canonical source is versioned in kstack. This site's copy tracks it.