Abstract
The Diagnostics Agent is the first production Anthropic Managed Agents workload at iSimplifyMe. It runs against incidents surfaced by Apex monitoring, gathers the relevant context, drafts a diagnosis and a proposed remediation, and surfaces both for a human reviewer before anything is acted on. The agent has no direct write access to production resources — its job is context, not action.
Problem
Incident triage for a small ops team is dominated by context-gathering, not decision-making. By the time the relevant logs, traces, and tenant configuration are pulled together, the human reviewing the incident has spent most of their time on rote work that an agent can do faster and more thoroughly.
The hesitation around AI in ops has rarely been about the language model itself. It has been about the orchestration around it: long-running tool use, retries, partial failure, audit trails, and a clean human approval gate that does not auto-execute under failing state. Building that infrastructure in-house was, until recently, the gating cost of putting AI into the incident loop at all.
Approach
One agent, one bounded task
The agent is scoped to operational diagnostics — not deployment, not configuration changes, not tenant communication. The narrow scope is deliberate: each agent does one well-defined operational task, and the human approval gate sits between the agent's output and any action against production.
Managed runtime
The workload runs on Anthropic's Managed Agents runtime, which handles the long-running, tool-using, retryable execution that incident triage actually requires. The internal team owns the prompt, the tool surface, and the approval UI; the runtime handles the harder parts of the loop.
Approval gate
The agent writes its diagnosis and proposed remediation to a queue surfaced inside the Apex admin UI. A reviewer reads, edits if needed, and either approves or discards. Approved actions hand off to the same deployment path a human-authored change would take. There is no auto-execute path, even on high-confidence diagnoses.
Status
- Live in production behind a canary flag, running against synthetic incidents end-to-end.
- Per-incident cost validated as cents per run on the synthetic set.
- Flip-to-real-traffic milestone scheduled.
- Same approval-gate pattern is the candidate template for additional Managed Agent workloads inside Apex (one agent per well-bounded operational task).
Links
- Apex Portal → https://apex.isimplifyme.com