Category: Agent Workflows that Actually Run

  • Human Approval Gates for High-Risk Agent Actions

    Human Approval Gates for High-Risk Agent Actions

    Connected Patterns: Understanding Agents Through Human Control That Still Scales
    “Speed is not the opposite of oversight. The right gate makes speed safe.”

    The moment an agent can take action in the world, you have two competing pressures:

    You want the agent to move quickly so it is worth using.
    You need the agent to be constrained so it is safe to use.

    Most teams resolve this conflict in the worst possible way:

    They remove the agent’s ability to act, which turns it into a chat assistant that produces suggestions but never closes loops.
    Or they let it act freely, and a single bad run breaks trust for months.

    Human approval gates are the middle path, but only if they are designed with care. A gate should not be a ceremonial checkbox. It should be an engineered boundary that:

    Stops the agent when risk is high.
    Explains what it intends to do.
    Shows evidence, not confidence.
    Makes approval fast when it is clearly safe.
    Makes denial easy and informative when it is not.

    A good gate keeps humans in charge without turning every run into a slow committee meeting.

    Why “approve everything” fails

    Teams often begin with a simple rule: the agent must ask for approval before it does anything. That sounds safe, but it usually collapses.

    People stop approving because it becomes noise.
    Approvals become rubber stamps.
    Review fatigue grows, and risk rises again.

    A gate that interrupts too often is a gate that will be ignored.

    The core design question is not “Should there be a gate?” The question is “When should the gate trigger, and what should it require?”

    The approval gate inside the story of production

    Approval is not a single switch. It is a risk-aware workflow that turns uncertainty into controlled action.

    ProblemWhat goes wrongWhat a gate provides
    High-impact side effectsOne run causes irreversible changesA stop point before commitment
    Ambiguous intentThe agent interprets the goal incorrectlyA human confirmation of scope
    Weak evidenceThe agent acts on shaky sourcesA requirement to show proof
    Hidden costThe agent burns budget on retriesA budget and escalation policy
    Accountability gapsNobody knows who authorized whatA signed decision trail

    A gate is a contract: the agent must earn the right to act.

    Risk tiers that actually work

    Instead of a binary “approve or not,” classify actions by risk. A practical set of tiers looks like this:

    Low risk

    • Read-only actions and harmless computations.
    • Draft outputs that do not get sent or published.
    • Tool calls with no side effects.

    Medium risk

    • Actions that are reversible or contained.
    • Actions that affect internal drafts, staging systems, or temporary data.
    • Actions that can be previewed before applying.

    High risk

    • External communication to users or customers.
    • Publishing, deploying, deleting, or altering production state.
    • Financial operations or security-sensitive changes.
    • Actions that create legal or reputational exposure.

    The gate triggers differently at each tier.

    The two-phase commit pattern for agents

    One of the most reliable patterns is borrowed from systems design: separate intent from commit.

    Phase one: propose
    The agent assembles a plan with explicit steps, evidence, and expected side effects.

    Phase two: commit
    After approval, the agent executes the plan exactly as approved, with minimal freedom to reinterpret.

    This pattern prevents a common failure where an agent asks for approval in vague terms and then does something different during execution.

    A strong gate enforces:

    The approved plan is frozen.
    Any deviation triggers a new approval request.
    Every side effect is tied to an approved step.

    What a reviewer needs to see

    If a gate is going to be fast, the review packet must be compact and complete. Reviewers should never have to hunt.

    A useful approval packet includes:

    Intent

    • What the agent is trying to accomplish in one sentence.
    • The boundaries it will not cross.

    Plan

    • A step list with tool calls and outputs expected.
    • A clear stop condition for success.

    Evidence

    • The sources used to make decisions.
    • The exact excerpts or data points that justify the action.

    Impact

    • What changes will occur if approved.
    • Whether those changes are reversible.
    • Who will be affected.

    Risk checks

    • What could go wrong and how it will be detected.
    • What rollback plan exists if rollback is possible.

    Budgets

    • Maximum tool calls, token budget, time cap.
    • Retry limits and escalation behavior.

    When a gate request is missing any of these, denial should be automatic. The burden is on the agent to present a reviewable case.

    Designing gates that scale

    A gate that requires senior reviewers for every action will become a bottleneck. The goal is to match the gate to the organization.

    Useful scaling tactics include:

    Role-based approvals

    • Low and medium risk actions can be approved by operators.
    • High risk actions require owners or on-call leads.
    • The policy is explicit and logged.

    Time-boxed approvals

    • If approval is not granted within a window, the agent pauses safely.
    • The agent does not keep retrying or escalating without limit.

    Batch approvals

    • The agent groups low-risk actions into a single approval packet.
    • The reviewer approves a bundle, not a stream of pings.

    Auto-approval with verification

    • For certain low-risk actions, the agent can proceed automatically if verification checks pass.
    • Failures trigger human review instead of being hidden.

    The point is not to remove humans. The point is to use human attention where it is most valuable.

    The gate should be a teaching moment

    Every approval or denial is feedback. If you treat it as feedback, the gate improves the agent over time.

    Capture:

    Why it was denied.
    What evidence was missing.
    Which risk tier was misclassified.
    Which part of the plan was unclear.

    Then feed that back into the agent’s policy:

    Update risk classification rules.
    Update the evidence requirements.
    Update tool routing and validation.
    Update the plan format.

    A mature agent system uses approvals to become safer and faster.

    Gate UI patterns that keep humans in control

    The preview-first pattern

    For actions like edits, posts, messages, or configuration changes:

    The agent generates a preview artifact.
    The human reviews the preview.
    Approval means “apply exactly this preview.”

    This avoids vague approvals and prevents last-moment reinterpretation.

    The diff-and-rollback pattern

    For file edits, configuration changes, or data updates:

    The agent shows a diff.
    The agent explains the diff in plain language.
    Approval triggers apply.
    Rollback is a one-click reversal when possible.

    Even when rollback is not perfect, this pattern makes impact visible.

    The escalation-first pattern

    For uncertain or high-stakes tasks:

    The agent escalates early with a narrow question.
    It asks for scope confirmation before doing work.
    It reduces ambiguity before collecting a big plan.

    This prevents large wrong turns that waste time and budget.

    Guardrails that keep gates from becoming theater

    Approvals become theater when they do not actually constrain behavior.

    A real gate enforces:

    No side effects without an approval token.
    Approval tokens are bound to a specific plan and expire quickly.
    Execution logs prove that approved steps were followed.
    Any new tool call outside the plan triggers a pause.

    This is the difference between “the agent asked” and “the agent was controlled.”

    A practical policy table

    Action typeDefault risk tierGate requirement
    Web search and summarizationLowNo gate, but log sources and excerpts
    Read-only database queryLowNo gate, require query preview and limits
    Writing a draft documentLowNo gate, require clear labeling as draft
    Editing a staging configurationMediumGate with diff preview and rollback plan
    Sending an internal messageMediumGate with preview and recipient list
    Publishing content publiclyHighGate with final preview, owner approval, and audit trail
    Deploying to productionHighGate with runbook alignment and on-call approval
    Deleting dataHighGate with double confirmation and backup check

    This is not a universal policy, but it shows the shape of a policy that teams can operationalize.

    The human-in-the-loop mindset

    The best agents do not treat humans as obstacles. They treat humans as the authority that makes action legitimate.

    When an agent requests approval well, it sounds like this:

    Here is what I will do.
    Here is why it is justified.
    Here is what could go wrong.
    Here is how I will stay within bounds.

    That tone does not slow work. It prevents catastrophic rework.

    Keep Exploring Reliable Agent Systems

    • Guardrails for Tool-Using Agents
    https://orderandmeaning.com/guardrails-for-tool-using-agents/

    • Production Agent Harness Design
    https://orderandmeaning.com/production-agent-harness-design/

    • Agent Run Reports People Trust
    https://orderandmeaning.com/agent-run-reports-people-trust/

    • Sandbox Design for Agent Tools
    https://orderandmeaning.com/sandbox-design-for-agent-tools/

    • Monitoring Agents: Quality, Safety, Cost, Drift
    https://orderandmeaning.com/monitoring-agents-quality-safety-cost-drift/

    • Agents for Operations Work: Runbooks as Guardrails
    https://orderandmeaning.com/agents-for-operations-work-runbooks-as-guardrails/

  • Guardrails for Tool-Using Agents

    Guardrails for Tool-Using Agents

    Connected Patterns: Making Powerful Systems Safe by Default
    “A capable agent without guardrails is a fast way to create expensive surprises.”

    Tool-using agents feel like a leap forward because they can act. They can search the web, read internal docs, run code, query databases, file tickets, and sometimes change real systems.

    That power is exactly why guardrails matter.

    Most harmful outcomes do not come from a malicious model. They come from a well-intentioned agent that:

    Misunderstood the goal.
    Trusted a poisoned source.
    Used the wrong tool.
    Repeated an action during retries.
    Acted without realizing the side effects.

    Guardrails are the system rules that prevent those outcomes. They are not decoration. They are what makes autonomy acceptable.

    The Guardrail Problem You Are Solving

    A tool-using agent sits at the intersection of three risks:

    • Safety risk: unintended side effects, destructive actions, data leaks
    • Quality risk: confident wrong outputs, unverified claims, drift over long runs
    • Cost risk: runaway loops, excessive tool calls, hidden spend

    Guardrails address all three by constraining what the agent can do, when it can do it, and how it must prove what it did.

    The best guardrails feel almost boring. That is the point. Boring systems are trustworthy systems.

    The Pattern Inside the Story of Safe Automation

    Safety in automation usually comes from two principles:

    • Least privilege: tools and permissions are as limited as possible
    • Proof before impact: risky actions require evidence and approval

    Agents need a third principle:

    • Separation of worlds: sandbox by default, production by exception

    When you combine these, you get a guardrail stack that looks like this.

    Guardrail layerWhat it constrainsWhat it prevents
    Tool allowlistWhich tools can be used at allShadow capabilities and surprise actions
    Permission scopesWhat each tool can accessData leaks and overreach
    Side-effect classificationWhich calls can change stateAccidental destructive actions
    Approval gatesWho must sign off, and whenHigh-risk automation mistakes
    Budget capsHow long and how expensive a run can beRunaway cost and infinite loops
    Verification gatesWhat must be checked before commitConfident wrong actions
    Logging and auditWhat must be recordedUntraceable incidents
    Sandbox isolationWhere actions are executedBlast-radius containment

    A guardrail system is not a single rule. It is a layered design where each layer assumes the others will sometimes fail.

    Guardrails That Actually Work

    Guardrails fail when they are vague or purely prompt-based. They work when they are enforceable by the harness.

    Tool allowlists and explicit defaults

    The agent should not have access to every tool “just in case.” Each workflow should have an explicit tool allowlist.

    Default posture:

    • No tool access until granted
    • Read-only tools preferred
    • Write tools require a higher trust level and a narrower scope

    This prevents accidental escalation of capability.

    Permission scopes that match the task

    Permissions should be granular:

    • A database tool might have separate read and write credentials.
    • A file tool might be limited to a specific directory.
    • A knowledge base tool might expose only a subset of collections.

    Scope is how you reduce harm even when the agent makes a mistake.

    Side-effect classification and commit rules

    Every tool call should be tagged as:

    • Read-only
    • Write but reversible
    • Write and irreversible

    Your harness can then enforce rules such as:

    • Read-only calls may be retried within caps.
    • Reversible writes require a rollback plan.
    • Irreversible writes require explicit approval.

    This turns “safety policy” into “safety mechanics.”

    Approval gates that respect human time

    Approvals work when they are concise and decision-shaped.

    A strong approval prompt includes:

    • Action proposed
    • Evidence summary
    • Expected impact
    • Risk summary
    • Rollback plan
    • What happens if the action is declined

    This lets a human approve safely without reading the whole transcript.

    Verification gates that make lying expensive

    A model can sound certain even when it is wrong. Verification gates force it to be checkable.

    Verification patterns include:

    • Cross-source checks for web retrieval
    • Schema validation for structured outputs
    • Unit checks and sanity checks for numbers
    • Spot-check prompts that require quoting evidence
    • Contradiction detection between steps

    If verification fails, the harness should block the commit and route to repair or escalation.

    Sandbox by default

    Many teams skip sandboxing because it feels like extra work. Then they learn, painfully, that a single bad run can create real damage.

    Sandboxing means:

    • Tools run in isolated environments first
    • Side effects are simulated or staged
    • Writes go to test systems unless explicitly approved for production
    • Outputs are reviewed before promotion

    The harness should make sandbox the default world. Production should feel like a deliberate escalation.

    Guardrails for Retrieval: The Prompt Injection Problem

    Tool-using agents often retrieve text from the web or internal documents. That text can contain instructions designed to hijack the agent.

    A guardrail system must assume retrieval can be adversarial.

    Practical retrieval guardrails:

    • Treat retrieved text as data, not as instructions.
    • Strip or ignore imperative language coming from sources.
    • Require citations for claims and prefer primary sources.
    • Use safe browsing policies: block unknown domains for high-stakes tasks.
    • Detect and flag content that tries to override system rules.

    If you do not build these in, your agent can be tricked into violating constraints while believing it is obeying them.

    Guardrails for Private Knowledge Bases

    When agents can access internal data, guardrails need an additional focus: data minimization.

    Patterns that help:

    • Default to summaries and snippets, not bulk exports.
    • Restrict the agent to the smallest set of documents needed.
    • Prevent the agent from reprinting sensitive text unless explicitly required.
    • Log retrieval queries and results for audit.

    The goal is not paranoia. The goal is to keep internal knowledge useful without turning it into a leak vector.

    Testing Guardrails Before They Matter

    Guardrails that only exist on paper will not hold under pressure. They need to be tested like any other safety-critical component.

    Practical tests you can run:

    • Permission boundary tests: attempt retrieval outside allowed scopes and confirm the harness blocks it.
    • Side-effect tests: simulate write actions and confirm approvals are required.
    • Prompt injection tests: feed retrieved text that tries to override rules and confirm it is treated as data.
    • Budget tests: force long loops and confirm caps halt the run with a clear report.
    • Logging tests: replay a trace and confirm a second operator can understand what happened.

    You can also define “guardrail triggers” and make the harness respond predictably.

    TriggerHarness responseWhat the user sees
    Missing evidence for a critical claimBlock commit, request verificationA clear request for sources or a safe stop
    Tool returns unexpected formatNormalize or escalateA note that the tool output was invalid
    Action classified as irreversibleRequire approval gateA concise approval prompt with impact and rollback
    Budget nearing capSwitch to summary mode or stopA partial deliverable plus next steps
    Retrieval includes instruction-like contentStrip, flag, and ignore directivesOutput grounded in verified sources, not page commands

    When teams adopt these tests, guardrails become something you can trust, not something you hope works.

    The Guardrail Mindset in Daily Operations

    Guardrails change how teams feel about deploying agents.

    Without guardrails, deployment feels like gambling. People delay adoption because the downside is unclear and the blast radius is scary.

    With guardrails, deployment feels like engineering. You know what the agent can do, what it cannot do, and what it must prove before it acts.

    That predictability unlocks iteration:

    • You can loosen constraints gradually as trust grows.
    • You can monitor where guardrails trigger and improve tools.
    • You can add capabilities without raising risk everywhere.

    In a mature system, guardrails are not a cage. They are the structure that makes freedom safe.

    Safety Is a Feature, Not a Tax

    The most successful agent systems treat guardrails as part of product quality.

    A safe agent is not less capable. It is more useful, because people can rely on it.

    The fastest path to adoption is not maximal autonomy on day one. It is a steady ramp where you start with read-only assistance, prove reliability with logs and run reports, then expand capabilities as your guardrails demonstrate they can contain mistakes. That is how trust becomes measurable.

    The aim is not to prevent every mistake. The aim is to prevent the mistakes that matter: the ones that create harm, destroy trust, or create irreversible side effects.

    When you build guardrails as enforceable mechanics, tool-using agents stop feeling like unpredictable magic and start feeling like reliable infrastructure.

    Keep Exploring Safety and Accountability

    • Human Approval Gates for High-Risk Agent Actions
    https://orderandmeaning.com/human-approval-gates-for-high-risk-agent-actions/

    • Sandbox Design for Agent Tools
    https://orderandmeaning.com/sandbox-design-for-agent-tools/

    • Safe Web Retrieval for Agents
    https://orderandmeaning.com/safe-web-retrieval-for-agents/

    • Agents on Private Knowledge Bases
    https://orderandmeaning.com/agents-on-private-knowledge-bases/

    • Monitoring Agents: Quality, Safety, Cost, Drift
    https://orderandmeaning.com/monitoring-agents-quality-safety-cost-drift/

    • Human Responsibility in AI Discovery
    https://orderandmeaning.com/human-responsibility-in-ai-discovery/

  • From Prototype to Production Agent

    From Prototype to Production Agent

    Connected Systems: Understanding Infrastructure Through Infrastructure
    “A prototype proves possibility. Production proves responsibility.”

    A prototype agent is often breathtaking. It answers correctly in a handful of test cases. It calls a tool once or twice. It feels like you just discovered a secret lever.

    Then you ship it into the real world and everything changes.

    • Inputs are messy, ambiguous, and emotionally charged.
    • Tools fail in ways you never simulated.
    • Costs matter because usage is constant, not occasional.
    • Safety becomes real because side effects touch real customers.
    • People judge the system not by the best day but by the worst day.

    Moving from prototype to production is not a single improvement. It is a shift in values. You stop optimizing for impressive. You start optimizing for operable.

    The Gap Between Demo Assumptions and Production Reality

    Prototypes are allowed to assume:

    • The user will ask the right question.
    • The context is clean and complete.
    • Tools respond quickly with correct outputs.
    • Failures are rare.
    • A human is watching closely.

    Production teaches different lessons:

    • Users ask for outcomes, not tasks.
    • Context arrives incomplete and often contradictory.
    • Tools return errors, partial results, and surprising formats.
    • Failures cluster, especially during peak load.
    • Humans are busy and will not read everything.

    A production agent must be designed so that mistakes degrade safely. It must be able to say, “I cannot prove this,” without collapsing.

    A table that keeps the transition honest

    Prototype assumptionProduction requirement
    A good prompt is enoughA harness with budgets, stop rules, and tool contracts
    The agent can figure it outA routing policy that forces verification and escalation
    One success case proves valueEvaluation and monitoring across diverse real cases
    Failures are edge casesFailure taxonomy and retries designed as first-class features
    Logs are optionalReproducible traces and run reports are part of the product
    Tools are just functionsTools are controlled interfaces with risk and blast radius

    If you can name the assumption, you can design for reality.

    Harness First: Turn a Model Into a System

    Production agents do not live as a single prompt. They live inside a harness.

    A harness is the container that enforces:

    • Step limits so loops cannot run forever
    • Cost and latency budgets that match your business constraints
    • Checkpoints so long work can resume safely
    • Idempotency so retries do not double side effects
    • Tool contracts so outputs are predictable and validatable

    The harness is where you protect the organization from the agent and protect the agent from chaos.

    Tool Contracts, Not Tool Hope

    In prototypes, teams call tools and hope the model will interpret results correctly. Production does not allow hope.

    A production agent requires tool contracts:

    • Inputs are typed and constrained.
    • Outputs are validated against schemas.
    • Errors are explicit and machine-readable.
    • Tools support preview, commit, and rollback when side effects exist.

    When tool contracts are clear, verification becomes possible. When tool contracts are fuzzy, every failure becomes a debate.

    Evidence and Verification: Show Your Work as a Policy

    A prototype can be persuasive and still be useful. A production agent must be verifiable.

    Verification gates make this real:

    • Critical claims require cited evidence.
    • Calculations must be reproducible.
    • Tool outputs must be cross-checked for contradictions.
    • If evidence is missing, the agent must switch modes: ask, escalate, or stop.

    This is the point where many teams feel tension, because verification can expose uncertainty. But uncertainty is already present. Verification simply prevents the agent from hiding it.

    Safety and Blast Radius: Make Doing Smaller Than Saying

    If the agent can take action, production changes everything.

    A production transition requires:

    • Sandboxing and environment boundaries
    • Read-only defaults and explicit approvals for writes
    • Reversibility for changes when possible
    • Human approval gates for high-risk actions
    • Clear escalation paths when the agent is uncertain

    A safe production agent is one that can be trusted to refuse.

    Degradation Modes: Decide How the Agent Fails Before It Fails

    The most important production question is not, “What happens when the agent is right?” It is, “What happens when the agent is wrong or confused?”

    Good degradation modes are explicit:

    • If tool calls fail repeatedly, the agent stops and produces a run report with what it tried.
    • If evidence is missing, the agent switches to question mode and asks for the missing input.
    • If sources conflict, the agent surfaces the conflict and routes to a reviewer.
    • If the task is high risk and approvals are unavailable, the agent produces a draft plan and waits.
    • If cost budgets are exceeded, the agent summarizes progress and exits gracefully.

    Degradation is not weakness. It is a promise that the system will not thrash.

    Observability and Run Reports: Make the Agent Auditable

    When something goes wrong, you need more than a transcript. You need a record of what happened.

    A production agent should produce artifacts that people trust:

    • Structured logs with tool-call inputs and outputs
    • Traces that show the sequence of actions
    • Checkpoint state for long runs
    • A run report that summarizes actions, evidence, approvals, and remaining risks

    Run reports are not documentation for its own sake. They are the bridge between automation and accountability.

    Monitoring and Evaluation: Reliability Is a Living Property

    The moment an agent is in production, it begins changing, even if you do nothing:

    • The model may be updated.
    • Tools change output formats.
    • Knowledge bases evolve.
    • User behavior shifts.

    Production means you monitor:

    • Quality
    • Safety
    • Cost
    • Drift

    And you evaluate changes before they become incidents:

    • Golden sets for replay
    • Canary windows for rollout
    • Thresholds that trigger rollback

    This is what makes the difference between shipping an agent and operating an agent.

    Incident Readiness: Treat the Agent Like a Real Service

    If the agent matters, it will have incidents. Prepare for that with the same seriousness you bring to other services.

    Incident readiness includes:

    • Clear ownership and on-call expectations
    • A way to disable high-risk tools quickly
    • A rollback path for policy and prompt changes
    • A playbook for common failure categories
    • A method for collecting and reviewing incident runs

    You do not need to fear incidents. You need to be ready to learn from them without chaos.

    Change Control: Make Improvements Without Surprises

    Teams often iterate on agents quickly because iteration is easy. That is good, but only if you can tell what changed and why.

    Change control practices that keep teams sane:

    • Version your policies and prompts like code
    • Record tool contract versions and schema changes
    • Tag deployments so monitoring can correlate regressions to changes
    • Run replay evaluations on a stable golden set before rollout
    • Use canary windows so you can roll back safely

    This turns iteration into progress instead of volatility.

    Adoption and UX: Reliability Must Be Felt

    Production readiness is not only technical. It is experiential.

    People decide whether to trust an agent by asking:

    • Does it admit what it does not know?
    • Does it show evidence when it makes claims?
    • Does it keep me safe when the task is risky?
    • Does it recover gracefully when something fails?

    A production agent earns adoption by being predictable. It is consistent about its boundaries, consistent about its evidence, and consistent about when it escalates. That consistency is what turns novelty into habit.

    Trust is not a marketing claim. Trust is an operational property you can observe: fewer surprise failures, fewer hidden side effects, fewer panicked escalations, and more confident approvals. When those things improve, adoption follows naturally.

    Team Workflow: Put Humans Where They Add the Most Value

    The mistake many teams make is either placing humans everywhere or removing humans entirely.

    Production maturity is the middle:

    • Agents do low-risk work quickly.
    • Humans review high-impact decisions.
    • Operators control side effects.
    • Requesters define success criteria up front.

    This is why role-based workflows matter. Production is not only code. It is people making decisions under constraints.

    The Verse Inside the Story of Systems

    A prototype is a proof of possibility. Production is a proof of character.

    Theme in the transitionWhat changes
    You stop performingYou start operating
    You stop optimizing for best caseYou start designing for worst case
    You stop trusting toneYou start trusting evidence
    You stop relying on attentionYou start relying on systems
    You stop shipping demosYou start shipping responsibility

    If you want agents that last, build them like you build anything you depend on: with constraints, evidence, and humility.

    Keep Exploring Systems on This Theme

    • Production Agent Harness Design
    https://orderandmeaning.com/production-agent-harness-design/

    • Tool Routing for Agents: When to Search, When to Compute, When to Ask
    https://orderandmeaning.com/tool-routing-for-agents-when-to-search-when-to-compute-when-to-ask/

    • Reliable Retries and Fallbacks in Agent Systems
    https://orderandmeaning.com/reliable-retries-and-fallbacks-in-agent-systems/

    • Agent Logging That Makes Failures Reproducible
    https://orderandmeaning.com/agent-logging-that-makes-failures-reproducible/

    • Monitoring Agents: Quality, Safety, Cost, Drift
    https://orderandmeaning.com/monitoring-agents-quality-safety-cost-drift/

    • Sandbox Design for Agent Tools
    https://orderandmeaning.com/sandbox-design-for-agent-tools/

    • Team Workflows with Agents: Requester, Reviewer, Operator
    https://orderandmeaning.com/team-workflows-with-agents-requester-reviewer-operator/

    • Verification Gates for Tool Outputs
    https://orderandmeaning.com/verification-gates-for-tool-outputs/

  • Designing Tool Contracts for Agents

    Designing Tool Contracts for Agents

    Connected Patterns: Turning Tool Calls Into Reliable Systems
    “A tool contract is the difference between an agent that guesses and an agent that can be trusted.”

    Agents do not fail only because they “reason badly.” They fail because the world around them is not shaped for their use.

    A human can look at a tool response that is half broken, oddly formatted, or missing a field and still recover. A human can notice that a date looks wrong, that a table column shifted, that a value is in the wrong units, and still make a good decision.

    An agent often cannot. If you hand an agent a fragile tool, you will get fragile behavior.

    This is why tool contracts matter. A contract is a promise the tool makes to the agent about what the tool does, what it will return, how errors will be expressed, and what side effects are allowed. Once you have contracts, you can validate, recover, and route safely. Without contracts, you are relying on luck, and your “agent” is just a prompt that sometimes gets away with it.

    The Problem You Are Actually Solving

    A tool contract is not documentation for humans. It is an operational boundary for an automated system.

    The contract exists to make these things true:

    • The agent can predict the shape of outputs even when the tool fails.
    • The agent can validate results quickly and detect partial or unsafe outcomes.
    • Side effects are explicit and can be blocked behind approvals.
    • Retries are safe because the tool supports idempotency.
    • The system can evolve without breaking older runs because versions are controlled.

    When those properties hold, you can build routing and verification policies that are calm and boring. Calm and boring is the point.

    What a Tool Contract Contains

    A good contract fits on one page, even if the tool is complex. It is specific enough for an agent to follow, and strict enough for engineers to test.

    A practical contract includes:

    • Purpose and scope, stated in one sentence.
    • Inputs with types, required fields, and allowed ranges.
    • Defaults and assumptions the tool will apply when inputs are missing.
    • Side effects, including what can be modified and what will never be modified.
    • Output schema, including required fields and “may be missing” fields.
    • Error schema that is always returned on failure in the same predictable shape.
    • Idempotency behavior, including the key or token that makes repeats safe.
    • Time, cost, and rate limits expressed as explicit budgets.
    • Security boundaries, including what the tool is forbidden to access or reveal.
    • Examples of both success and failure responses.

    If you build these into the tool output itself, the agent can keep itself honest without extra prompt tricks.

    The Single Most Important Rule

    Tools must return a structured envelope even when they fail.

    If a tool returns a blank string, a random stack trace, or an HTML error page, your agent has no reliable way to recover. It will either hallucinate a result or retry until the run dies.

    A contract envelope looks like this conceptually:

    • status: success or error
    • data: the normal output payload if successful
    • error: a typed error object if failure
    • warnings: non-fatal issues that the agent should surface
    • metadata: latency, cost estimate, version, and idempotency information

    When every tool response fits this shape, you can write simple validation and routing rules that work across the whole system.

    Contracts Turn “Edge Cases” Into Normal Cases

    If you do not use contracts, your system has infinite edge cases. Every tool failure becomes a new prompt band-aid.

    If you do use contracts, most edge cases collapse into a small set of typed outcomes:

    • Validation failed because inputs were malformed.
    • Permission denied because the action is restricted.
    • Transient failure because the network or provider is down.
    • Partial result because the tool hit a budget.
    • Conflict because the requested change would overwrite something.

    Each of those outcomes can map to a calm policy. Calm policies are what keep agent systems from becoming incidents.

    Failure you will seeContract clause that prevents chaosWhat the agent can do safely
    Output missing fieldsRequired fields with validation errorsAsk for missing inputs or re-run with corrected params
    Tool returns unstructured errorsAlways-return error envelopeClassify, backoff, and report without guessing
    Retry causes duplicate side effectsIdempotency key requirementRetry safely without creating duplicates
    Tool does too muchExplicit side-effect listBlock, request approval, or switch to read-only mode
    Tool runs foreverHard time budgets + partial flagStop, summarize partial results, choose a fallback

    Contract-First Tool Design

    A common mistake is building the tool, then writing the contract later. By then, behavior is inconsistent and hard to constrain.

    Contract-first design flips the order:

    • Write the envelope schema.
    • Write the success payload schema.
    • Write the error taxonomy for that tool.
    • Decide which fields are mandatory versus optional.
    • Decide how partial results must be marked.
    • Decide what the tool is forbidden to do.

    Then implement the tool to satisfy the contract. This produces a tool that is easier to test, safer to expose, and friendlier to agents.

    Make Side Effects Loud

    Agents should not infer side effects. They should see them.

    If a tool can change state, the contract should say exactly what it can change and under what conditions. It should also provide a “dry run” mode that returns a preview rather than performing the change.

    Dry runs are one of the best tools for agent safety because they force the system into a verify-before-act rhythm:

    • Preview the change.
    • Validate that the preview matches intent.
    • Ask for human approval if required.
    • Execute the change using the same idempotency key.

    This pattern eliminates most high-cost mistakes.

    Validation Is Part of the Contract

    Validation should not live only in the agent prompt. It should be built into the system.

    A contract should include explicit invariants that can be checked automatically. Examples:

    • Returned totals must match the sum of line items.
    • Dates must be ISO-8601 and include timezone when relevant.
    • IDs must match a defined regex.
    • Currency must be explicit and never implied.
    • “Updated_count” must equal the number of objects in “updated_items”.

    When the tool enforces these, your agent can trust the tool more, and your monitoring can alert earlier.

    Human-Readable Fields Are Still Valuable

    Structured outputs keep machines reliable. Human-readable fields keep operators sane.

    A good contract can include a short summary field that mirrors the structured output:

    • summary: one paragraph describing what happened
    • evidence: a short list of references or identifiers
    • next_actions: suggested follow-ups when partial results occur

    These are not excuses for unstructured blobs. They are operator aids that make run reports and debugging easier without compromising reliability.

    Versioning Without Pain

    If your tool contract changes, older agents can break.

    The simplest way to prevent this is explicit versioning:

    • The agent sends a requested contract version.
    • The tool responds with the version it used.
    • When a breaking change is needed, publish a new major version and keep the old one available.

    Versioning feels like overhead until the day you need it. That day will arrive.

    The Payoff: Better Routing and Better Safety

    Once you have tool contracts, routing becomes straightforward.

    The agent does not need to “feel” its way through the run. It can follow crisp rules:

    • If validation fails, do not retry. Ask for corrected inputs.
    • If permission is denied, escalate for approval rather than trying alternatives.
    • If transient failure, retry with backoff and a max attempt cap.
    • If partial result, summarize, then decide whether to expand budget or accept partial.
    • If conflict, produce a diff and ask a human to choose.

    Those rules turn chaotic behavior into a controlled system.

    Testing a Contract Like You Mean It

    If a contract cannot be tested, it will drift.

    The quickest way to make contracts real is to build a small suite of contract tests that run in CI:

    • Golden responses for a few standard inputs, including edge values.
    • Property checks that required fields always exist and optional fields are marked correctly.
    • Error tests that force each typed error and confirm the envelope shape.
    • Idempotency tests that repeat the same call and verify no duplicate side effects.
    • Budget tests that confirm partial results are labeled and summaries are consistent.

    When contract tests exist, an agent system becomes easier to evolve because every change proves that the boundary still holds.

    Keep Exploring Reliable Agent Workflows

    • Tool Routing for Agents: When to Search, When to Compute, When to Ask
    https://orderandmeaning.com/tool-routing-for-agents-when-to-search-when-to-compute-when-to-ask/

    • Guardrails for Tool-Using Agents
    https://orderandmeaning.com/guardrails-for-tool-using-agents/

    • Verification Gates for Tool Outputs
    https://orderandmeaning.com/verification-gates-for-tool-outputs/

    • Sandbox Design for Agent Tools
    https://orderandmeaning.com/sandbox-design-for-agent-tools/

    • From Prototype to Production Agent
    https://orderandmeaning.com/from-prototype-to-production-agent/

  • Context Compaction for Long-Running Agents

    Context Compaction for Long-Running Agents

    Connected Patterns: Preserving Truth When Context Is Finite
    “Long-running work fails when yesterday’s decisions disappear.”

    Every agent that runs longer than a few minutes meets the same wall: context is finite, but work is not.

    At first, everything feels smooth. The agent can see the request, the constraints, the previous tool outputs, and the plan. Then the run grows. A few documents are read. A few tool calls return results. The user makes a correction. The agent tries a different branch. After enough turns, the early constraints slide out of view.

    That is when the agent starts to drift.

    It repeats work it already did.
    It re-litigates decisions it already settled.
    It forgets what was disallowed and proposes risky actions again.
    It invents new assumptions because it cannot see the old ones.

    Context compaction is the discipline of turning a growing conversation into a stable, inspectable state snapshot that preserves the decisions that matter.

    It is not “summarize the chat.” It is “preserve the working truth.”

    Why Compaction Is Harder Than Summarization

    A normal summary tries to be short and readable. A production compaction tries to be short and correct.

    Correctness is harder because long-running agent work has multiple kinds of information mixed together:

    • Requirements that must not be lost
    • Decisions that must not be reversed accidentally
    • Evidence that must be tied to its source
    • Open questions that must remain open
    • Tentative ideas that must not masquerade as facts
    • Tool outputs that must be preserved without distortion

    If a compaction blurs these categories, the agent becomes confident in the wrong things. The system feels “smart” right up until it makes a costly mistake.

    A good compaction has to do what a good lab notebook does: separate observation from interpretation, record what happened, and make it possible to pick up the work later without re-inventing the story.

    The Pattern Inside the Story of Reliable Work

    Every mature production process learns to separate “the narrative” from “the state.”

    The narrative is how you tell the story to a human. The state is what you need to keep the work correct.

    Agents need the same separation.

    A practical compaction produces two artifacts:

    • A state snapshot that the harness and agent use to continue work
    • A run narrative that a human can read to understand what happened

    The state snapshot is where you store constraints, decisions, and verified facts. The narrative is where you store context, explanation, and helpful detail.

    If you only store narrative, the agent will misread it later. If you only store state, humans will not trust it. You need both, but you must not confuse them.

    What Must Survive Compaction

    Think of compaction as a filter. The goal is not to keep everything. The goal is to keep the right things, in the right form.

    Here is a practical way to structure the compacted state:

    State bucketWhat belongs hereCommon failure if missing
    Goal and success criteriaThe exact outcome the run must deliverThe agent “finishes” with the wrong deliverable
    Constraints and policiesAllowed tools, disallowed actions, required approvalsSafety rules get forgotten and violated
    Decisions and rationalesWhat was decided and whyThe agent reopens settled debates endlessly
    Verified factsStatements supported by evidence and tool outputsOpinions become “facts” and drift multiplies
    Evidence indexLinks to sources, tool outputs, file hashes, citationsYou cannot audit or reproduce the work
    Open questionsUnresolved issues and what is needed to resolve themThe agent pretends uncertainty is resolved
    Pending actionsNext steps with dependencies and stop rulesThe agent improvises and gets lost
    Budget and risk signalsSpend counters, confidence flags, contradictionsRunaway loops and false certainty

    Notice what is not listed: every conversational flourish, every brainstorm, every half-formed idea. Those can live in narrative logs. The state should be sharp.

    The Compaction Method That Works in Practice

    A reliable compaction approach is less like writing and more like bookkeeping.

    Step selection based on commits

    Compaction should happen at predictable moments, not randomly. The best moment is after a commit: after the agent produces an artifact, executes a safe action, or reaches a verified milestone.

    This gives you a natural boundary:

    • Before the commit: tentative work and drafts
    • After the commit: verified outcome and updated state

    When compaction is tied to commits, you can replay the run like a chain of checkpoints.

    A strict fact-policy boundary

    Your compaction must not mix policy with interpretation.

    Policy includes: “Do not call tool X,” “Do not modify production,” “All external claims require citations,” “Budget cap is Y.”

    Facts include: tool outputs, observed results, confirmed constraints.

    Interpretations include: the agent’s explanations, guesses, and plans.

    Keep these separate. If you do not, the agent will treat interpretations as policies or treat policies as optional suggestions.

    Preserve contradictions explicitly

    Long-running work often encounters conflicting signals: two sources disagree, two tool calls return different numbers, a dataset changes between runs.

    A compaction that resolves contradictions by picking a winner is dangerous. The right move is to record the contradiction and record the verification plan.

    Example contradiction entry:

    • Conflict: Source A says X, Source B says Y
    • Impact: affects decision Z
    • Next verification: run check Q, request human review, or fetch authoritative data

    This allows the agent to continue without pretending certainty.

    Use structured formats, not paragraphs

    Free-form prose is the enemy of long-running reliability. It is too easy to misread later.

    Use a structured representation that the harness can validate. JSON with a schema works. A stable markdown template can work if it is strictly formatted. The key is predictability.

    The compaction should be machine-friendly first, human-friendly second.

    Keep raw evidence out of the compaction

    It is tempting to paste tool outputs into the compacted state. That grows fast and creates new context pressure.

    Instead, store an evidence index:

    • Tool call ID
    • Timestamp
    • Input parameters
    • Output hash or file path
    • Short, verified extraction (only what you need)

    This keeps the state small while preserving auditability.

    The Compaction in the Life of the Agent

    Context compaction changes how an agent behaves over hours and days.

    Without compaction, the agent’s “memory” becomes a fog. It must guess what matters. It becomes susceptible to whatever was said most recently.

    With compaction, the agent gets a stable foundation. It can act like an operator following a clear runbook:

    • The goals remain visible.
    • Constraints remain enforceable.
    • Decisions remain anchored.
    • Evidence remains traceable.
    • Uncertainty remains honest.

    This is also where you can make drift expensive. If the agent proposes an action that violates the compacted constraints, the harness can block it automatically. If it claims a “fact” not listed as verified, the harness can require evidence before allowing a commit.

    In other words, compaction is not just storage. It is enforcement.

    Common Compaction Mistakes That Create Drift

    Even careful teams tend to stumble in a few predictable ways.

    • Compaction that sounds confident when it is not: phrases like “the data shows” without preserving what data, what query, and what version.
    • Compaction that hides the reason: a decision is recorded, but the rationale is lost, so the agent reopens the debate later.
    • Compaction that collapses options into one path: alternatives vanish, so the agent cannot recover when the chosen path fails.
    • Compaction that treats tool output as gospel: raw outputs are copied into state without validation, and downstream steps inherit the error.
    • Compaction that grows without pruning: state becomes a second transcript, and the same context pressure returns.

    A good harness treats compaction as a budgeted operation. It has a target size, a validation step, and a rule that old, superseded items are marked as superseded rather than quietly overwritten. That is how you preserve history without carrying dead weight.

    A Simple Compaction Checklist

    If you want one practical standard, use this:

    • Anything that changes the future must be written into state.
    • Anything that is uncertain must be labeled uncertain.
    • Anything that is risky must require a gate.
    • Anything that must be audited must have an evidence pointer.
    • Anything that is obsolete must be marked obsolete, not deleted quietly.

    The goal is a state that can be handed to a different model, a different machine, or a different engineer, and still remain true.

    Preserving Truth Over Time

    Long-running agents do not fail because they forget a sentence. They fail because they forget what was binding.

    Context compaction is how you keep binding things binding: constraints, decisions, and verified facts.

    When you treat compaction as part of the harness, long tasks stop feeling like fragile conversations and start feeling like steady operations. The agent can still be creative and flexible, but it is anchored. It does not have to reinvent itself every thousand tokens.

    That is what makes “long-running” possible.

    Keep Exploring Reliable Long-Running Work

    • Agent Memory: What to Store and What to Recompute
    https://orderandmeaning.com/agent-memory-what-to-store-and-what-to-recompute/

    • Preventing Task Drift in Agents
    https://orderandmeaning.com/preventing-task-drift-in-agents/

    • Agent Checkpoints and Resumability
    https://orderandmeaning.com/agent-checkpoints-and-resumability/

    • Multi-Step Planning Without Infinite Loops
    https://orderandmeaning.com/multi-step-planning-without-infinite-loops/

    • Agent Logging That Makes Failures Reproducible
    https://orderandmeaning.com/agent-logging-that-makes-failures-reproducible/

    • The Lab Notebook of the Future
    https://orderandmeaning.com/the-lab-notebook-of-the-future/

  • Build Your First Agent Harness in One Afternoon

    Build Your First Agent Harness in One Afternoon

    Connected Patterns: Understanding Agents Through Minimal, Strong Constraints
    “The fastest way to build an agent is to build the boundaries first.”

    A lot of people try to build an agent by starting with the model prompt.

    That feels natural. The model is the shiny part.

    But prompts do not create reliability. Harnesses do.

    A harness is the controlled loop around the model: the budgets, the tool contracts, the checkpointing, the verification gates, and the stop rules that turn “smart text” into work you can trust.

    The good news is that your first harness does not need to be big. You can build a minimal one in an afternoon if you focus on the parts that prevent the classic failures.

    This guide gives you a small build that can do real work safely. It will feel boring compared to prompt tuning. That boredom is the point. Reliability is often boring.

    What You Are Building

    You are building a loop that can:

    • Accept a task input
    • Choose a small sequence of actions
    • Call tools through contracts
    • Verify tool outputs
    • Save checkpoints
    • Stop with a final status and a run report

    You are not trying to build a general intelligence. You are building an operable worker with guardrails.

    Pick One Narrow Task

    Choose a task that is useful but low-risk.

    Examples:

    • Summarize a document into a structured memo
    • Draft a customer reply that a human must approve
    • Produce a change log from a list of commits
    • Turn meeting notes into action items without inventing owners

    Avoid tasks with irreversible side effects at first. You can add those later behind approvals.

    Define the Run Contract Before You Write a Prompt

    Write down the contract as plain text. The harness will enforce it.

    A simple contract includes:

    • Required artifacts: what outputs must exist
    • Allowed tools: which tools can be used
    • Budgets: max tool calls, tokens, retries, and time
    • Stop ladder: completed, paused, failed, aborted
    • Approvals: what requires human review
    • Evidence rules: what claims must be supported by tool outputs

    This contract is the part you will reuse for every future agent.

    Create Tool Contracts That Are Easy to Verify

    A tool contract is not just “call an API.” It is a promise about inputs and outputs.

    Good contracts include:

    • A schema for the response
    • Error shapes you expect
    • An idempotency approach for side effects
    • A latency expectation
    • A validation function the harness can run

    When a tool output is untrusted, the agent becomes untrusted.

    So treat tool contracts like you treat database schemas: explicit, validated, boring.

    A simple example of a contract mindset

    If your tool returns “account summary,” define what that means:

    • required fields: account_id, status, plan, last_invoice_date
    • optional fields: notes, tags
    • error fields: error_code, retryable
    • freshness expectation: updated within a known window

    A tool contract is the difference between evidence and vibes.

    Add a Verification Gate After Every Tool Call

    A verification gate is a set of checks that run after each tool call.

    Checks might include:

    • Required fields exist
    • Values are within expected ranges
    • The response is not empty
    • The tool did not return a partial failure signal
    • The output matches invariants you can assert mechanically

    If a gate fails, the harness chooses a safe branch:

    • Retry with backoff if the error is transient
    • Pause if a dependency is unhealthy
    • Escalate to a human if the task cannot proceed safely

    This is how you prevent confident nonsense from becoming confident action.

    Enforce Budgets in the Harness

    Budgets are not a suggestion to the model. They are a hard wall.

    Budgets to enforce:

    • Max tool calls per run
    • Max tokens per run
    • Max wall-clock time per run
    • Max retries per tool call

    When a budget is hit, the harness stops the run and produces a report that shows why. That report is more valuable than another attempt.

    Why budgets make agents better

    Budgets force prioritization.

    Instead of endlessly exploring, the agent must choose the highest-value next action. That makes your system more predictable and your outputs more consistent.

    Add Checkpoints After Every Meaningful Stage

    An agent without checkpoints cannot be trusted to run unattended.

    Checkpoints should store:

    • Current stage
    • Inputs and references
    • Tool outputs that matter
    • Decisions made so far
    • Next intended actions

    A checkpoint makes the system resumable. It also makes debugging possible.

    If you cannot resume, you will rebuild work. If you cannot debug, you will eventually stop trusting the system.

    What a good checkpoint contains

    FieldWhy it matters
    stagethe workflow position the agent is in
    evidence_refsIDs for documents, logs, tool outputs used
    decisionsthe rationale for branching choices
    pending_inputsapprovals or missing data that block progress
    next_actionsa small plan that can be resumed safely
    budgets_remainingprevents runaway work after resume

    Checkpoints are your safety net and your debugging map.

    Use a Stop Ladder Instead of a Single “Success” State

    A stop ladder gives the loop clear landing places.

    A minimal ladder:

    • Completed: artifacts produced and validated
    • Paused: human input required
    • Failed: validation failure or missing requirements
    • Aborted: budget exceeded or stop signal received

    The key is to treat “paused” as a correct outcome, not a failure. Many real tasks cannot be completed without a person.

    Add a Human Approval Gate the Simple Way

    Even in a small harness, you can support approvals.

    Approvals are a stage, not an emotion.

    A simple pattern:

    • The agent produces a draft artifact
    • The harness marks the run as paused with reason “approval required”
    • A reviewer provides an approval token or a rejection note
    • The harness resumes from the checkpoint

    This prevents the most dangerous behavior: the agent acting while uncertain.

    A Minimal Flow That Is Easy to Implement

    Here is a practical structure for your first harness, expressed as a stable loop:

    • Initialize run state with budgets and policy snapshot
    • Load task input
    • Choose the next action from a small action set
    • If the action is a tool call, execute through the tool contract
    • Verify output through the gate
    • Save checkpoint
    • Repeat until the stop ladder outcome is reached
    • Emit run report and final artifacts

    You can implement this with a simple state machine. The model does not need to invent the architecture. It operates inside it.

    The “One Afternoon” Build Plan

    If you have limited time, build in this order:

    • Harness skeleton: state machine, budgets, stop ladder
    • One tool: with a clear contract and validation
    • One artifact: a structured output format
    • Checkpointing: save and resume
    • Run report: human-readable summary

    You can add richer planning later. What matters first is that the loop is bounded.

    Test With Canned Fixtures Before You Trust Live Data

    The fastest way to gain confidence is to run the harness against fixed inputs.

    • Save a few representative tasks as fixtures
    • Save tool outputs that simulate success and failure
    • Confirm the harness stops correctly under each condition
    • Confirm the run report is readable when things go wrong

    Fixtures turn reliability into something you can test, not something you hope for.

    What to Measure From Day One

    If you do not measure, you will not notice drift until it hurts.

    Track:

    • Tool calls per task
    • Token usage per task
    • Validation failure rate
    • Retry counts
    • Pause rate and reasons
    • Completion rate
    • Human approval turnaround time

    These metrics are not about surveillance. They are about knowing when the system is leaving the safe zone.

    A Minimal Run Report That People Actually Read

    Even for your first harness, produce a short report.

    Include:

    • Final status from the stop ladder
    • Budgets used and remaining
    • Tools called and any failures
    • Whether validation gates passed
    • Whether the run paused for approval
    • Links or IDs for produced artifacts

    If you can read the report and understand the run in one minute, you are on the right track.

    A Quick “Am I Safe Yet” Table

    If this is trueYour harness is missing
    The agent can call tools foreverBudgets and stop ladder enforcement
    The agent can repeat side effectsIdempotency keys and verification checks
    The agent cannot resume after restartCheckpoints with resumable state
    Tool output can be malformed and still usedValidation gates
    Approvals do not pause the runA real paused stage in the workflow
    Failures are hard to debugStructured logs and a run report

    You do not need a perfect system to start. You need a bounded system.

    The Payoff: A Small Agent You Can Actually Trust

    A prompt can produce a good answer once.

    A harness produces consistent behavior over time.

    Once you have your first harness, every future agent becomes easier. You are not starting from nothing. You are reusing the constraints that make work reliable.

    That is the real speed advantage.

    Keep Exploring How to Build Agent Systems

    If you want to go deeper on the ideas connected to this topic, these posts will help you build the full mental model.

    • Production Agent Harness Design
    https://orderandmeaning.com/production-agent-harness-design/

    • Designing Tool Contracts for Agents
    https://orderandmeaning.com/designing-tool-contracts-for-agents/

    • Reliable Retries and Fallbacks in Agent Systems
    https://orderandmeaning.com/reliable-retries-and-fallbacks-in-agent-systems/

    • Verification Gates for Tool Outputs
    https://orderandmeaning.com/verification-gates-for-tool-outputs/

    • Agent Checkpoints and Resumability
    https://orderandmeaning.com/agent-checkpoints-and-resumability/

    • From Prototype to Production Agent
    https://orderandmeaning.com/from-prototype-to-production-agent/

  • AI Evaluation Harnesses: Measuring Model Outputs Without Fooling Yourself

    AI Evaluation Harnesses: Measuring Model Outputs Without Fooling Yourself

    AI RNG: Practical Systems That Ship

    A model can sound brilliant and still be unreliable. It can answer one demo perfectly and then fail on the same question tomorrow because a dependency changed, a prompt drifted, or retrieval pulled a different source. If you are building AI features that must hold up under real traffic, you need more than “it looks good.” You need a way to measure quality that stays honest as the system changes.

    An evaluation harness is the discipline that keeps you from shipping vibes. It is a repeatable way to run representative cases, score outcomes against a rubric, and detect regressions before users do. The word “harness” matters: it is something you can hook to your system and pull on it from many angles until weaknesses show up.

    Why AI evaluations go wrong

    Teams often “do evals” and still learn nothing because the evaluation is built to confirm a belief instead of discover reality. The common traps are predictable.

    TrapWhat it looks likeWhat it causesThe fix
    Cherry-picked casesOnly the good-looking examples are includedYou ship a system that collapses on normal inputsBuild a representative case set and keep it fixed
    Moving goalpostsThe definition of “good” changes when results are inconvenientYou cannot compare versions honestlyFreeze rubrics and track rubric revisions separately
    Proxy metricsYou measure a shortcut (length, positivity, style)Models optimize for the proxy, not the userTie metrics to user outcomes and failure modes
    Uncontrolled variablesModel version, tools, retrieval, and prompts change togetherYou never know what caused improvement or regressionVersion everything and isolate changes
    Single-score blindnessOne aggregate number hides dangerous failuresSevere edge cases are buried in averagesTrack slices and “must-not-fail” rules

    A harness is not a spreadsheet of opinions. It is an experiment design that protects you from your own bias.

    Decide what “good” means before you measure

    If you cannot state the contract, you cannot evaluate. “The model answers correctly” is not a contract. A contract says what matters, what is allowed, and what is forbidden.

    A practical contract has three layers.

    • Outcome: what must be true for the user. The answer is correct, actionable, and complete enough to proceed.
    • Constraints: what must not happen. The answer must not fabricate sources, leak private data, or omit critical safety steps.
    • Style expectations: what makes it usable. The answer is clear, structured, and aligned with your voice.

    Once you have a contract, turn it into a rubric that multiple people could apply and get similar scores.

    A rubric that stays stable

    A stable rubric is specific, testable, and connected to failure modes you can name.

    • Correctness: does it match ground truth or a verified reference?
    • Completeness: does it include the required steps or key facts?
    • Faithfulness: does it stay consistent with provided sources and citations?
    • Safety and policy: does it avoid disallowed content and unsafe actions?
    • Usefulness: can a user actually do something with it?

    Some of these can be automated, but most systems need a blend: automated checks for obvious failures and human scoring for nuance.

    Build the harness as a pipeline, not a meeting

    An evaluation harness is a pipeline that takes inputs, runs your system, collects outputs, scores them, and produces a report you can compare across versions.

    Harness componentWhat it doesWhat “done” looks like
    Case setRepresents the problems users actually bringA frozen dataset with clear provenance and labels
    RunnerCalls your system the same way production doesOne command runs the full suite end to end
    ScorersApply automated checks and human rubricsScores are reproducible and explained
    SlicingBreaks results into meaningful groupsYou can see where the system fails, not only averages
    Regression gatingBlocks merges that break contractsA clear threshold and an exception process
    ReportSummarizes deltas and top failuresA diff you can read in minutes

    If the harness is hard to run, it will not be used. Treat “easy to run” as a quality requirement.

    Start with a case set that is small but real

    You do not need ten thousand cases on day one. You need enough to represent the diversity of real usage.

    A good starter set includes:

    • Common cases: the daily bread of your product.
    • High-risk cases: where wrong answers are costly.
    • Boundary cases: ambiguous queries, partial information, contradictory inputs.
    • “Must not fail” cases: compliance, permissions, private data, or safety.

    Keep a simple rule: when production fails, add a case. Over time, your harness becomes a memory of everything you have learned.

    Treat retrieval and tools as part of the system

    If your system uses retrieval, tools, or external data, your harness must control those variables or record them.

    For retrieval:

    • Snapshot the documents or build a versioned corpus.
    • Store the retrieved chunks alongside each output.
    • Score faithfulness: did the answer match what the system retrieved?

    For tool calls:

    • Record tool inputs and outputs.
    • Fail the case if a tool produces an error that should have been handled.
    • Separate “model quality” failures from “tool reliability” failures.

    The harness should tell you whether the model failed, the pipeline failed, or both.

    Score outputs in a way that produces decisions

    The purpose of scoring is not to produce a number. It is to produce decisions.

    A useful scorecard includes:

    • Pass or fail on hard constraints: no fabricated citations, no policy violations, no missing required steps.
    • A graded score for quality: correctness and usefulness on a consistent scale.
    • Error tags: why it failed, in language that suggests a fix.

    Use “hard gates” for dangerous failures

    Some failures should block release, even if the average score looks fine.

    Examples:

    • Citation mismatch: the answer claims a source that was not retrieved.
    • Data exposure: private identifiers appear in output.
    • Permission violation: the system performs an action without authorization.
    • Critical omission: safety steps are missing.

    Hard gates are how you protect users from statistical excuses.

    Track slices, not only aggregates

    One average score can hide a lot of harm. Slices reveal where the system is fragile.

    Useful slices include:

    • Query type: “how to,” “diagnosis,” “compare,” “summarize,” “generate.”
    • Domain: billing, support, operations, engineering, legal.
    • Retrieval coverage: cases with strong sources vs thin sources.
    • Input complexity: short prompts vs long context.
    • Language and formatting: code-heavy vs prose-heavy.

    When you see a regression, slices tell you where to look first.

    Prevent overfitting to the harness

    A harness that never changes can become a target. People tune prompts until the suite passes, without improving real-world behavior.

    You need a rhythm:

    • A frozen “gate set” that changes slowly and represents core usage.
    • A rotating “challenge set” that changes regularly and explores new edges.
    • A blind set that is hidden from prompt tuning, used for periodic audits.

    This keeps the evaluation honest without making it chaotic.

    Make evals part of daily engineering

    A harness only matters if it is wired into the workflow.

    • Run a small smoke subset on every change.
    • Run the full suite on nightly builds or before releases.
    • Tie results to change summaries so reviewers see what shifted.
    • Save artifacts: inputs, outputs, retrieved context, and scores.

    When a regression appears, you should be able to answer: which change introduced it, and why.

    A starter checklist for your first harness

    • Define the contract: outcomes, constraints, and style expectations.
    • Build a small case set from real traffic and real failures.
    • Implement a runner that calls the full pipeline in a controlled way.
    • Add hard gates for the failures you cannot tolerate.
    • Add slices that reflect how users actually use the system.
    • Record artifacts so debugging is possible.
    • Use regression packs so fixes stay fixed.

    The goal is not perfection. The goal is to stop shipping blind, and start shipping with evidence.

    Keep Exploring AI Systems for Engineering Outcomes

    Data Contract Testing with AI: Preventing Schema Drift and Silent Corruption
    https://orderandmeaning.com/data-contract-testing-with-ai-preventing-schema-drift-and-silent-corruption/

    AI Observability with AI: Designing Signals That Explain Failures
    https://orderandmeaning.com/ai-observability-with-ai-designing-signals-that-explain-failures/

    AI for Building Regression Packs from Past Incidents
    https://orderandmeaning.com/ai-for-building-regression-packs-from-past-incidents/

    AI Release Engineering with AI: Safer Deploys with Change Summaries and Rollback Plans
    https://orderandmeaning.com/ai-release-engineering-with-ai-safer-deploys-with-change-summaries-and-rollback-plans/

    AI for Documentation That Stays Accurate
    https://orderandmeaning.com/ai-for-documentation-that-stays-accurate/

  • Agents on Private Knowledge Bases

    Agents on Private Knowledge Bases

    Connected Systems: Understanding Infrastructure Through Infrastructure
    “This is the rule that keeps systems honest: no evidence, no claim.”

    There is a quiet kind of failure that only shows up after the demo looks successful. The agent answers quickly, sounds confident, and even quotes your internal policy with a neat citation. Then a subject matter expert reads it and says, “That policy is outdated,” or worse, “That policy never existed.”

    When an agent is connected to a private knowledge base, the most dangerous error is not a wrong answer. It is an answer that looks documented.

    Private knowledge bases feel safer than the open web because the information is internal, curated, and usually written by your own people. In reality, private knowledge adds a different set of risks:

    • Permissions are complex and mistakes leak confidential material.
    • Documents conflict because teams ship policies at different speeds.
    • Freshness matters because the newest rule is often the only rule that counts.
    • Citations can be fabricated because the agent wants to be helpful.

    If you want agents that can operate on private knowledge without turning your organization into a rumor mill, you need a simple principle: the agent is not allowed to be persuasive. The agent is allowed to be verifiable.

    The Problem Hidden Inside “Just Connect It to the Wiki”

    A private knowledge base is not just a pile of documents. It is a living system of authority.

    A policy page can override a runbook. A legal memo can override a sales playbook. A new incident postmortem can invalidate a decade of best practices. Tickets and chat transcripts contain valuable reality, but they are also full of local workarounds and partial truths.

    When you connect an agent to internal content, you are asking it to answer two questions at once:

    • What does the content say?
    • Which content should be trusted for this decision?

    If you do not make authority and evidence explicit, the agent will invent an authority chain for you. It will pick whatever snippet matches the user’s wording, and it will treat the nearest source as the best source. That is how internal retrieval turns into “policy by proximity.”

    Evidence Rules That Turn Retrieval Into Knowledge

    The safest way to use a private knowledge base is to treat retrieval as a courtroom, not a library. The agent can propose an answer, but it must show the supporting record.

    A practical evidence rule set looks like this:

    • The agent must attach cited excerpts for every operational claim.
    • Each excerpt must include a stable document identifier, a version or timestamp, and the exact text span used.
    • If the agent cannot find evidence, it must say so and move to a safe fallback.
    • If sources conflict, the agent must show the conflict and explain which source wins based on a defined precedence policy.

    These rules do not slow the agent down as much as people fear. They remove expensive backtracking, reduce escalation churn, and make “agent output” something a human can actually review.

    A table that clarifies authority

    Source typeWhat it is good forWhat it is dangerous forHow to use it safely
    Policies and standardsClear rules, definitions, approvalsBeing out of date, being too generalRequire version and last reviewed date, prefer official owners
    Runbooks and playbooksAction steps, operational constraintsLocal workarounds treated as universalBind to scope metadata, require environment labels
    Postmortems and incident notesReality, failure patterns, guardrails“One incident” becoming “the rule”Use as cautionary evidence, not normative authority
    Tickets and chatEdge cases, customer context, symptomsMisleading, incomplete, personally identifiable infoRedact by default, treat as context not policy
    Dashboards and metrics docsCurrent state, definitions, thresholdsMetric drift, renamed fieldsRequire metric dictionary mapping and owners

    Your agent should not be the judge of this table. The system should be the judge. The agent should be forced to follow the table.

    Access Control That Is Actually Enforced

    Most teams say they will respect access controls, then they wire retrieval through an index that has already flattened permissions.

    A reliable private-knowledge agent uses enforcement at retrieval time, not only at indexing time.

    • Document-level permissions should be checked at query time with the user’s identity.
    • Chunk-level redaction should be supported so a document can be shared without exposing every section.
    • Output should be filtered for sensitive data patterns, even when the user is authorized, because accidental leakage can still happen in pasted summaries.

    A useful mental model is “read what you can show.” If the agent cannot show a cited excerpt to the user because the user lacks permission, then the agent cannot use that excerpt to form the answer.

    This closes the most common loophole: an agent that is technically constrained from revealing a document still leaks its meaning through paraphrase.

    Freshness, Versioning, and the Myth of “The Source of Truth”

    Private knowledge bases usually have multiple truths in flight:

    • The policy says one thing.
    • The platform team shipped a change that breaks the policy.
    • The runbook is updated, but the policy review board has not met.
    • Customer support has learned a workaround that the docs do not yet acknowledge.

    If your agent treats all documents equally, it will average these truths and produce a middle that nobody stands behind.

    A more realistic approach is to tag each knowledge object with currency signals:

    • Last reviewed date
    • Owner team
    • Environment scope
    • Confidence level (official, provisional, historical)
    • Supersedes or replaced-by relationships

    Then you set routing rules:

    • If the question is about what to do now, prefer official, recently reviewed sources.
    • If the question is “why did we do this,” allow older sources as historical context.
    • If there is no recent official source, the agent must mark the answer as provisional and recommend the owner team.

    This is where “agent memory” can destroy quality. If the agent stores a cached policy summary that later becomes invalid, the agent becomes a fast delivery system for outdated guidance. Private knowledge agents should store pointers and evidence trails, not durable decisions.

    Conflict Handling Without Argument Theater

    Internal documents disagree. The agent must not pretend they do not.

    Instead of smoothing, the agent should do triage:

    • Identify the precise point of conflict.
    • List the competing sources with their currency metadata.
    • Apply the precedence rule.
    • If precedence is ambiguous, escalate with a minimal question that resolves authority.

    A conflict does not mean the agent is useless. A conflict is often the most valuable output, because it reveals organizational drift.

    A simple precedence policy you can implement

    You can adopt a precedence hierarchy that the agent must follow:

    • Compliance and legal policies override operational playbooks.
    • Security policies override convenience procedures.
    • Official owner-team docs override ad hoc tickets.
    • Newer reviewed documents override older ones when scope is equal.
    • Narrower scope overrides broader scope when both are current.

    The key is not the exact hierarchy. The key is that the hierarchy exists and is visible.

    The “No Fabricated Citations” Contract

    In private knowledge systems, fabricated citations are more damaging than in public systems, because they look like insider truth.

    A strong contract is simple:

    • If the agent cannot attach an excerpt, it cannot claim the document supports the statement.
    • If the agent cannot access the document due to permissions, it must not use it.
    • If the excerpt is partial, the agent must label it as partial and avoid broad conclusions.

    You can reinforce this contract by designing the UI and output format so evidence is normal:

    • Put citations directly under claims.
    • Make it easy for a reviewer to click through to the source.
    • Include a “what I did not check” section in the run report.

    The Verse Inside the Story of Systems

    When teams first adopt private-knowledge agents, they often assume the system is about answering questions. The deeper reality is that the system is about trust.

    Theme in real organizationsExpression in a private-knowledge agent
    Knowledge is scattered across teamsRetrieval needs metadata and precedence, not just embeddings
    Authority matters more than fluencyEvidence and owner identity must be first-class
    Freshness is a form of correctnessCurrency signals must shape routing and citations
    Confidentiality is a constant pressurePermission checks and redaction must be enforced at retrieval time
    Drift happens quietlyConflict detection and escalation must be normal, not exceptional

    If you get this right, the agent becomes a pressure relief valve. Instead of creating new confusion, it forces clarity into the system.

    The Verse in the Life of the Operator

    If you are building or running these agents, your temptation will be to optimize for answers that sound complete. Resist that.

    The outputs that keep a business safe and fast are outputs that can be verified quickly.

    You can think of it like this:

    Your fearThe safer reality
    “If the agent admits uncertainty, people will stop using it.”People stop using systems that betray them, not systems that are honest.
    “If we require excerpts, the agent will be slow.”Excerpts reduce long debates, rework, and mis-executed changes.
    “If we surface conflicts, we will look disorganized.”Conflicts exist already. Surfacing them is how you become organized.
    “If we enforce permissions strictly, answers will be incomplete.”Incomplete is safer than leaked. You can still route to an authorized reviewer.

    A private knowledge base is a precious thing. An agent can help people access it, but it must be taught to treat knowledge as evidence, not vibes.

    Keep Exploring Systems on This Theme

    • Designing Tool Contracts for Agents
    https://orderandmeaning.com/designing-tool-contracts-for-agents/

    • Verification Gates for Tool Outputs
    https://orderandmeaning.com/verification-gates-for-tool-outputs/

    • Agent Logging That Makes Failures Reproducible
    https://orderandmeaning.com/agent-logging-that-makes-failures-reproducible/

    • Safe Web Retrieval for Agents
    https://orderandmeaning.com/safe-web-retrieval-for-agents/

    • Monitoring Agents: Quality, Safety, Cost, Drift
    https://orderandmeaning.com/monitoring-agents-quality-safety-cost-drift/

    • Agent Run Reports People Trust
    https://orderandmeaning.com/agent-run-reports-people-trust/

    • Preventing Task Drift in Agents
    https://orderandmeaning.com/preventing-task-drift-in-agents/

    • Human Approval Gates for High-Risk Agent Actions
    https://orderandmeaning.com/human-approval-gates-for-high-risk-agent-actions/

  • Agents for Operations Work: Runbooks as Guardrails

    Agents for Operations Work: Runbooks as Guardrails

    Connected Patterns: Runbook-Driven Agents That Help Without Taking Over
    “Operations is not creativity. It is correctness under pressure.”

    Operations work is where agent hype meets reality.

    It is also where agents can deliver real value.

    Operations is repetitive, documented, and full of high-frequency decisions. Many tasks have clear prerequisites, clear steps, and clear definitions of “done.” That shape is friendly to agents.

    Operations is also unforgiving. A mistaken command can take down a system. A rushed change can create hours of recovery work. A confident but wrong diagnosis can waste an entire incident.

    The only way to use agents in operations without losing trust is to bind them to runbooks.

    A runbook is not a suggestion. It is a guardrail. It defines what is allowed, what must be checked, and how to roll back if the world surprises you.

    Why Runbooks Are the Correct Interface for Ops Agents

    If you let an operations agent “figure it out,” you are asking for improvisation in the one domain that punishes improvisation.

    Most successful ops teams already operate through runbooks, checklists, and incident procedures. The agent should not replace that discipline. The agent should embody it.

    A runbook-driven ops agent can:

    • Locate the correct procedure quickly
    • Gather the required context and metrics
    • Propose the next safe action
    • Execute read-only checks automatically
    • Ask for approval before any side effect
    • Capture a complete audit trail for later review

    The agent becomes a structured assistant, not a free-form operator.

    The Blast Radius Problem

    The main risk of ops agents is blast radius.

    A single wrong action can affect:

    • Many users
    • Many services
    • Many regions
    • Many hours of recovery time

    A good ops agent system is designed around blast radius containment.

    The harness needs to know:

    • Which tools have side effects
    • Which actions are reversible
    • Which environments are safe for exploration
    • Which commands are allowed in production
    • Which services are in-scope for the agent

    Then the agent is confined to a safe set by default.

    A Runbook as a Contract, Not a Document

    Most runbooks are written for humans.

    Agents need runbooks written as contracts.

    A contract runbook has structured sections:

    Runbook sectionWhat it containsWhat the agent must do with it
    PreconditionsRequired context and safe conditionsVerify them with read-only checks before proceeding
    SymptomsObservable signals and logsMatch evidence to symptoms, avoid guessing
    Diagnosis stepsQueries and checksExecute and record results in a consistent format
    Action stepsCommands, deploys, config changesPropose with rollback, require approval for side effects
    Stop rulesEscalation conditionsTrigger paging or human review immediately
    Post-checksVerification after actionsConfirm the system is healthy before closing
    NotesKnown pitfalls and edge casesSurface them when conditions match

    This structure turns operations from improvisation into controlled execution.

    Runbook Selection Is a Decision That Must Be Verifiable

    A subtle failure mode is choosing the wrong runbook.

    An agent sees an error message, grabs a similar-sounding procedure, and begins acting.

    A runbook-driven agent should treat selection as a claim that needs evidence.

    It should produce a short mapping:

    • Observed symptoms and signals
    • Why they match this runbook’s symptom section
    • Which preconditions are satisfied
    • Which alternative runbooks were considered and why they were rejected

    This is not paperwork. It is what prevents “we fixed the wrong thing” incidents.

    Read-Only by Default

    The simplest guardrail is a default posture.

    Ops agents should be read-only until a human approves a change.

    Read-only actions include:

    • Fetching metrics and logs
    • Running health checks
    • Comparing current state to baselines
    • Gathering evidence for diagnosis
    • Drafting incident summaries and timelines

    Write actions include:

    • Deploys
    • Configuration changes
    • Restarts
    • Scaling actions
    • Access policy changes

    Write actions should require explicit approval, even if the agent has a clear runbook.

    This protects the organization from the most damaging failure mode: the agent acting quickly while nobody is watching.

    Severity-Aware Autonomy

    Not every incident deserves the same autonomy.

    A safe pattern is to tie agent permissions to severity.

    Severity postureWhat is at stakeWhat the agent can do
    InformationalNo user impactDiagnose, summarize, open tickets, run read-only checks
    DegradedPartial impact or riskDiagnose, propose actions, request approvals, rehearse in staging
    Major incidentWidespread impactOperate only with explicit approvals, emphasize rollback and post-checks
    CriticalSafety, security, or large-scale outageEscalate immediately, prioritize human control, produce a clear evidence packet

    This posture makes the system predictable during the moments that matter most.

    The Incident Loop an Ops Agent Should Follow

    An operations agent should not jump to solutions.

    It should follow a disciplined loop that mirrors good incident response:

    • Establish what is happening using evidence.
    • Identify the runbook that matches symptoms.
    • Run read-only checks to confirm assumptions.
    • Propose the next safe action, including rollback.
    • Request approval for side effects.
    • Execute, then verify with post-checks.
    • Record everything into a run report.

    This is not slow. It is stable.

    Speed in operations comes from clarity, not from skipping steps.

    Approval Gates That Keep Humans in Control

    Human approval is not a bottleneck if you design the gate well.

    The agent should present a compact approval packet:

    • Proposed action
    • Why this runbook step applies
    • Preconditions verified
    • Expected effect
    • Rollback plan
    • Risk assessment
    • Post-check plan

    A reviewer can approve in seconds when the packet is clear.

    If the packet is messy, humans will block everything, and the system dies.

    Access Control as a First-Class Guardrail

    Even a perfect runbook becomes dangerous if credentials are too broad.

    Ops agents should use scoped credentials:

    • Environment scoping, so a staging credential cannot touch production
    • Service scoping, so an agent for one domain cannot act on another
    • Action scoping, so restart permissions do not imply deploy permissions
    • Time scoping, so elevated permissions expire automatically

    This is not only security. It is operational safety. It ensures that mistakes fail closed.

    Change Windows and Safe Timing

    Some ops actions are safe only in specific windows.

    Deploying during peak traffic can create risk even when the change is correct.

    A runbook-driven agent should be aware of timing rules:

    • Maintenance windows
    • Freeze periods
    • Rate limits on rollouts
    • Required notifications for customer-impacting changes

    When timing constraints apply, the agent should propose a plan rather than executing immediately.

    ChatOps and the Two-Channel Pattern

    Ops teams often work in chat. Agents can fit naturally there.

    A safe pattern is to use two channels:

    • A public incident channel where summaries and approvals happen
    • A private execution channel where raw tool outputs and logs are stored

    The agent posts concise updates publicly and attaches deep evidence privately.

    This keeps humans oriented without drowning the channel.

    It also creates an audit trail that is easy to review later.

    Sandboxes, Staging, and Rehearsal Runs

    One of the highest-leverage patterns is rehearsal.

    Before a risky production action, the agent can:

    • Replay the runbook in staging
    • Run the diagnostic steps on historical incident data
    • Simulate command effects where possible
    • Validate access and permissions
    • Confirm that rollback commands are available and safe

    Even when rehearsal cannot prove the outcome, it reduces unknowns.

    It also builds confidence that the agent is following procedure rather than inventing steps.

    Logging and Postmortems as Part of the Product

    If an ops agent changes anything, the log is not optional.

    The log is part of the system’s accountability.

    A good ops agent record captures:

    • Time-ordered actions
    • Tool inputs and outputs
    • Approvals and reviewer identities
    • Evidence used for decisions
    • Preconditions and post-checks
    • Rollbacks and why they were triggered

    This record is what makes postmortems easier and what makes leadership willing to expand agent permissions over time.

    The Agent’s Job Is to Make On-Call Kinder

    Operations work often happens when people are tired.

    Incidents happen at night. Alerts arrive during weekends. Pressure rises when customers are impacted.

    Runbooks protect people from making impulsive decisions in moments of stress.

    An ops agent bound to runbooks extends that protection.

    It helps the team stay steady, preserve evidence, and act with restraint. It also frees humans to do the work that requires judgment: weighing tradeoffs, communicating externally, and coordinating the response.

    A Practical Way to Introduce Ops Agents

    Operations trust is earned gradually.

    A safe rollout path:

    • Start with diagnosis-only mode.
    • Add read-only automation for checks and summaries.
    • Add approval-gated write actions for low-risk runbooks.
    • Expand to higher-risk actions only after evidence of reliability.

    This approach prevents the “one bad incident kills the project” outcome.

    Runbooks do not limit what an ops agent can do. They make what it does survivable.

    Keep Exploring Agents That Operate Safely

    • Guardrails for Tool-Using Agents
    https://orderandmeaning.com/guardrails-for-tool-using-agents/

    • Human Approval Gates for High-Risk Agent Actions
    https://orderandmeaning.com/human-approval-gates-for-high-risk-agent-actions/

    • Agent Logging That Makes Failures Reproducible
    https://orderandmeaning.com/agent-logging-that-makes-failures-reproducible/

    • Sandbox Design for Agent Tools
    https://orderandmeaning.com/sandbox-design-for-agent-tools/

    • Team Workflows with Agents: Requester, Reviewer, Operator
    https://orderandmeaning.com/team-workflows-with-agents-requester-reviewer-operator/

    • From Prototype to Production Agent
    https://orderandmeaning.com/from-prototype-to-production-agent/

  • Agents for Data Work: Safe Querying Patterns

    Agents for Data Work: Safe Querying Patterns

    Connected Patterns: Data Agents That Do Not Break Production
    “The fastest way to lose trust is a query that silently changes the truth.”

    Data work is a perfect target for agents, and it is also a trap.

    It is perfect because data tasks are often repetitive:

    • Pull the relevant tables
    • Filter by constraints
    • Summarize patterns
    • Produce a report
    • Validate a metric

    It is a trap because data systems are full of sharp edges:

    • A missing filter can scan a warehouse
    • A join can multiply rows and produce plausible nonsense
    • A write can corrupt a dataset silently
    • A schema change can break logic without errors
    • A query can leak private information

    A data agent is not valuable because it can write SQL.

    A data agent is valuable because it can behave safely while doing useful work.

    Safe querying patterns are the design rules that make that possible.

    Default to Read-Only Roles

    The most important decision is permission.

    A data agent should operate under read-only credentials by default.

    If it must write, the system should require:

    • An explicit role escalation
    • A narrow scope
    • An approval gate
    • A clear rollback strategy

    This is the data version of least privilege.

    It reduces blast radius before you even start debating prompt quality.

    Separate Exploration From Production

    Humans often explore in production because it is convenient.

    Agents should not.

    A safe design separates environments:

    • Development for exploration and iteration
    • Staging for rehearsal and performance checks
    • Production for read-only validation or approved changes

    When you cannot fully separate, you can still simulate separation:

    • Use read replicas
    • Use query governors
    • Use row limits
    • Use sandbox datasets that mirror shape without containing sensitive rows

    The agent should be aware of environment boundaries and refuse to cross them without permission.

    The Preview-Then-Commit Pattern

    A common failure mode is a query that looks fine and is wrong.

    The fix is to make preview a first-class step.

    The agent should:

    • Draft the query
    • Run an explain or dry-run
    • Execute a limited preview with strict row limits
    • Check for obvious anomalies
    • Only then execute the full query when safe

    This pattern catches many mistakes early:

    • Missing predicates
    • Exploding joins
    • Wrong time windows
    • Unexpected null rates
    • Schema mismatch

    It also makes review faster. Humans can approve the intent based on the preview rather than trusting an unseen full run.

    Guardrails for Expensive Queries

    Cost is not only financial. It is capacity.

    An expensive query can slow down other users, trigger throttling, or push a warehouse into contention.

    Safe data agents enforce query budgets:

    • Maximum bytes scanned
    • Maximum runtime
    • Maximum rows returned
    • Maximum concurrency per run

    The agent should treat these as constraints, not suggestions.

    If the query would exceed limits, the agent should propose alternatives:

    • Sampling
    • Pre-aggregations
    • Narrower time windows
    • Partition filters
    • Materialized intermediate tables in a safe workspace

    Patterns That Prevent Quiet Wrongness

    Some errors are worse than failures.

    A failed query is obvious.

    A query that returns plausible but wrong numbers can live for months.

    Safe data agents must defend against quiet wrongness.

    Here are patterns that help:

    PatternWhat it doesWhat it prevents
    Row-count sanity checksCompares row counts to historical rangesSilent data explosions and missing data
    Join cardinality checksTests whether joins multiply unexpectedlyDouble-counting that looks plausible
    Null and distribution checksSamples key fields and compares distributionsHidden schema changes and parsing errors
    Reconciliation queriesCross-checks metrics with an independent methodOne-query truth errors
    Audit columns and lineageTracks source tables and transformationsUntraceable results and “where did this number come from”

    These checks do not need to be perfect. They need to be consistent and visible.

    Metric Definitions Are Part of Safety

    A surprising amount of data work goes wrong because the metric itself is ambiguous.

    If “active user” has three definitions across teams, the agent can produce an answer that is correct for one definition and wrong for the question the requester meant.

    A safe data agent should treat metric definitions as first-class retrieval.

    Before running a query, it should retrieve the definition:

    • Which events or tables define the metric
    • Which filters apply
    • Which time window logic is expected
    • Which exclusions exist
    • Which version is current

    If a metric definition is missing or conflicting, the correct move is to ask or escalate, not to guess.

    Time Windows, Time Zones, and the Illusion of Precision

    Many data mistakes come from time.

    A query can be “correct” and still wrong because the window definition is inconsistent.

    Examples:

    • A day boundary is UTC in one system and local time in another
    • A rolling window includes partial days unexpectedly
    • A backfill reprocesses late events and shifts historical numbers

    A safe data agent should explicitly state the time assumptions it is using and, when possible, verify them against the metric definition.

    When time assumptions are unclear, it should surface that uncertainty rather than hiding it behind precise-looking numbers.

    Semantic Layers and Data Contracts

    Agents do better when the data model is explicit.

    If the organization has a semantic layer or data contracts, the agent should prefer them over raw tables.

    A contract tells the agent:

    • What a field means
    • What values are allowed
    • What nulls imply
    • What joins are safe
    • What the intended grain is

    Without contracts, an agent can still succeed, but it must do more verification work to avoid false joins and misinterpretation.

    Writing Safely When Writes Are Required

    Sometimes data work requires writes:

    • Backfills
    • Corrections
    • Aggregation tables
    • Feature tables for modeling
    • Materialized reports

    If a data agent writes, it should follow strict patterns:

    • Write to a new table or partition first
    • Validate row counts and distributions
    • Compare the new output to a baseline
    • Only then promote or swap, under approval
    • Keep the old version for rollback

    This turns write into stage, validate, then promote.

    It also makes failures survivable.

    Idempotency for Data Writes

    If an agent ever writes, it must be idempotent.

    That means the same operation can be applied twice without changing the outcome.

    This protects you from retries, crashes, and partial failures.

    Idempotency patterns include:

    • Writing to a new table with a run identifier
    • Using merge semantics rather than blind inserts
    • Using unique keys and conflict handling
    • Logging the write intent and checking before re-running

    A data agent without idempotency is a data corruption generator.

    Privacy and Access Boundaries

    Data agents must never assume that “if I can query it, I can share it.”

    The agent should enforce:

    • Output redaction rules
    • Aggregation thresholds
    • Row-level privacy constraints
    • PII detection and masking

    If the request might involve sensitive data, the agent should escalate to a human approval gate, even if the query is read-only.

    Trust in data work is fragile. Once broken, it is hard to restore.

    Verification Gates for Data Outputs

    Even read-only queries need verification.

    A data agent should treat results as claims that require evidence.

    That evidence can include:

    • Query text and parameters
    • Preview outputs and sanity checks
    • Cross-check results
    • Source table identities and timestamps
    • Metric definition references

    The final deliverable should include enough information for someone else to reproduce the result without guessing.

    This is how you keep data work honest at scale.

    The Agent’s Job Is to Make Review Easy

    A safe data agent makes humans faster, not slower.

    That means it produces reviewable artifacts:

    • The query
    • A short explanation of intent
    • The preview sample
    • The sanity checks performed
    • The final output with clear caveats

    When review is easy, humans approve quickly and the system remains safe.

    The Triage Question: What Kind of Data Request Is This

    Support agents triage tickets. Data agents should triage requests.

    A practical triage split:

    • One-off analysis, where speed matters and reproducibility is still required
    • Metric reporting, where definitions and baselines are critical
    • Data correction, where writes and rollbacks dominate the risk
    • Executive reporting, where clarity and confidence are more important than novelty

    If the agent can identify the request type, it can choose the correct safe workflow automatically.

    Safe Querying Is the Beginning of Trust

    Many teams try to deploy data agents by starting with “write a query.”

    A better approach is to start with “write safely.”

    If the agent can consistently follow safe patterns, the team will slowly allow it more autonomy.

    If it cannot, no amount of clever prompting will save it.

    Data is the record of reality inside an organization. That record deserves careful stewardship.

    A safe data agent is one that treats truth as something to be proven, not something to be declared.

    Keep Exploring Safe Data and Evidence Patterns

    • Designing Tool Contracts for Agents
    https://orderandmeaning.com/designing-tool-contracts-for-agents/

    • Agent Error Taxonomy: The Failures You Will Actually See
    https://orderandmeaning.com/agent-error-taxonomy-the-failures-you-will-actually-see/

    • Verification Gates for Tool Outputs
    https://orderandmeaning.com/verification-gates-for-tool-outputs/

    • Monitoring Agents: Quality, Safety, Cost, Drift
    https://orderandmeaning.com/monitoring-agents-quality-safety-cost-drift/

    • Agent Run Reports People Trust
    https://orderandmeaning.com/agent-run-reports-people-trust/

    • Sandbox Design for Agent Tools
    https://orderandmeaning.com/sandbox-design-for-agent-tools/