← Back to perspectives

April 18, 2026 · Jeff Rogers

Why Runbook Policies Aren't Claude Skills

Skills and policies both let you write intent in plain English. But one is a prompt an LLM executes, and the other is a rule an engine executes. The difference decides which one you want for your business.

A sharp question has been coming up lately: "Isn't a Runbook policy just a Claude Skill?"

It's a good question, and the similarity is real. Both let a human write intent in plain English. Both are reusable, versionable, and meant to make AI behavior predictable. If you squint at a Claude Skill file and a Runbook policy, you might convince yourself they are the same idea with different branding.

They aren't. And the difference isn't cosmetic — it's the thing that decides whether the software is agent-shaped or substrate-shaped, and whether it survives the kill switch test.

I want to walk through this honestly, because the question deserves better than "ours is different because we say so."

The genuine similarity

Let's concede where the comparison lands. Both skills and policies:

  • Are authored by humans, not trained into a model
  • Encode intent in natural language
  • Persist across invocations — they aren't disposable prompts
  • Can be edited, versioned, shared, reviewed
  • Make AI behavior more predictable than pure conversational prompting

If you are evaluating whether to use Claude Skills for your personal workflow, this post is not telling you to stop. For single-user task automation — code review, issue drafting, report formatting, research scaffolds — skills are a genuinely good tool. The question I want to answer is narrower: can skills do the job policies do? And the answer is no, because they are built for a different problem.

Execution model

A Claude Skill is a markdown instruction file. When invoked, Claude reads the file and reasons about what to do. Every invocation is an LLM call. The skill is the input to a model; the model is the thing that acts.

A Runbook policy is a structured rule stored as a row in a database. When a trigger fires — a message arrives, a timer expires, a procedure step completes — the engine matches the event against the policy and executes the action. No LLM is involved in routing, matching, SLA checks, capacity lookups, or escalation decisions. The policy is not the input to a model. The policy is the decision.

This is the architectural distinction that everything else follows from. Skills are prompts that get run by an LLM. Policies are rules that get run by an engine. One is agent-shaped. One is substrate-shaped.

Who invokes it

Skills require an asker. You open Claude, you type something, Claude matches the skill to your intent, and it executes. Required: a user, a session, a prompt. No user, no invocation.

Policies fire when something happens in the world. A homeowner sends a message. An operator completes a procedure step. An SLA threshold passes. A scheduled time arrives. There is no user asking. There is no session. The engine evaluates continuously because the world keeps changing whether or not anyone is logged in.

This is the difference between "help me with my work" and "the business runs while I sleep." Both are useful. They are not the same thing.

Multi-actor

Skills are single-user by design. You invoke Claude, Claude does something for you, you get the output. There is no operator/admin/customer model. There is no routing between actors based on qualification, availability, or asset proximity. There is no "if Alex is busy with a water heater job, escalate to Jordan; if Jordan isn't qualified on HVAC, escalate to the admin; if the admin doesn't resolve in 20 minutes, escalate to the owner."

Policies have that built in, because they operate over a structured actor graph with roles, qualifications, capacity, and fallback chains. The actor graph is part of the substrate. Skills don't have one, and you can't retrofit it cleanly — it would require turning the skill into an agent that reasons about all of that on every invocation, which puts you back in the cost and consistency problems agents have.

State and memory

Skills are effectively stateless. You invoke, Claude runs with whatever context you provide, a result comes back, and then nothing. There is no record of what fired, when, against what inputs, with what result. There is no way to query "show me every time the new-hire-onboarding skill ran in the last 30 days." The session transcript is all you have, and it disappears when the session closes.

Policies write to a ledger on every evaluation. Every trigger match, every action fired, every exception surfaced — immutable, timestamped, queryable, reportable. The ledger is what lets you say "Policy X hasn't fired in two weeks — is it dead?" or "Policy Y fires thirty times a day — should we tune the threshold?" It is also what lets you show an insurance carrier, an auditor, or a regulator that the thing that was supposed to happen actually happened.

The memory is the moat. Skills don't have one.

The kill switch test

This is the cleanest way to see the gap.

Turn off the LLM. What can your skills do?

Nothing. They are markdown files waiting to be interpreted. The whole system stops when the model stops.

Turn off the LLM. What can your policies do?

Fire. Match triggers. Route work. Enforce SLAs. Write ledger entries. You lose perception — no more classifying photos, transcribing voice notes, or parsing unstructured messages — but the operation itself keeps running because the operation lives in the authored rules, not in the model.

That is what "structure first, AI second" means in practice. Skills fail the kill switch test. Policies pass it.

The clean comparison

| | Claude Skill | Runbook Policy | |---|---|---| | What lives in it | Instructions for an LLM | A structured rule | | What executes it | The LLM | A deterministic engine | | When does it run | When a user invokes | When an event fires | | Who does it serve | One user at a time | A multi-actor business | | Cost per run | Full LLM call | ~0 (edges only) | | Audit trail | Session transcript | Immutable ledger | | Multi-actor routing | No | Yes | | Kill switch test | Fails | Passes |

Where each is right

Skills are right for: personal task automation, developer workflows, report formatting, code review scaffolds, research outlines, invoking specialized tool patterns. Anywhere the question is "make Claude do this well when I ask it."

Policies are right for: business operations that must run consistently without anyone asking. When the owner is not there. When the employee is new. When the customer messages at 2 AM. When SLA clocks keep ticking. When a regulator or an insurance carrier is going to ask "how did you make sure this happened every single time." Anywhere the question is "what should be true in my business regardless of who is running it today."

The two are not in conflict. A Claude Skill for "help me onboard this new employee" is useful — it produces onboarding docs, drafts welcome messages, runs a research query on their background. A Runbook policy for "every new employee must complete safety training within seven days or their first shift is blocked" is a different animal. It doesn't need anyone to invoke it. It fires on its own at the seven-day mark. It blocks the shift without an LLM in the loop. It writes a ledger entry that an auditor can see a year later.

Both can coexist. A sophisticated company will have dozens of skills for how individuals do their work and dozens of policies for how the business runs. They sit at different layers of the stack.

The layer argument

In our three-layer stack, skills live in the Agent Layer. Policies live in the Primitives Layer underneath. The Agent Layer is bespoke per user or per task. The Primitives Layer is where the business itself lives — its rules, its roles, its assets, its memory.

If you are trying to run a business with skills alone, you are building the Agent Layer on top of nothing. Every skill has to re-establish what the business is, what the rules are, who the actors are, every time it runs. It is the same problem the "agents will do everything" pitch has. It does not survive the moment when the LLM gets something wrong and there is no structure underneath to catch it.

If you are trying to do personal automation with policies alone, you are over-engineering a problem that didn't need the substrate. A skill is lighter, cheaper, and right for the job.

The mistake is thinking one of them is a substitute for the other. They solve different problems at different layers.

Close

Skills encode intent an LLM executes. Policies encode intent an engine executes. Same word, different architectures. Same plain-English authoring surface, different runtime, different cost curve, different failure mode, different place in the stack.

When someone asks "isn't this just a Claude Skill?" — the honest answer is "it looks like one, and that's why the question is good. But watch what happens when you turn the model off."

Skills are agents. Policies are substrate.


This post continues the positioning thread from Why We Don't Call Runbook an Agent and Architect Mode Needs a Substrate. For the economics of the same argument, see The Economics of Authored vs Inferred.

#positioning#policies#skills#ai