Skip to main content

Sentinel

Production monitoring agent layer — Claude on Anthropic Managed Agents, Slack-gated.

AI Infrastructure·Beta·Rev. 2026·Anthropic MA · Claude · Slack

What is Sentinel?

Sentinel is iSimplifyMe's production monitoring agent layer — Claude agents running on Anthropic Managed Agents that catch regressions, anomalies, and operational incidents across iSM-managed infrastructure. The first Sentinel workload, the Diagnostics Agent, shipped April 2026 as the first Anthropic Managed Agents production deployment at iSM. Every Sentinel detection fires into Slack with structured context, a recommended action, and a human approval gate before any remediation runs.

Abstract

Sentinel is iSimplifyMe's production monitoring layer — a fleet of Claude agents running on Anthropic Managed Agents that catch regressions, anomalies, and operational incidents across the iSM stack. It is internal infrastructure, not a customer-facing product, and runs to keep the rest of the platform honest.

Problem

Production AI infrastructure has more silent failure modes than monitorable ones. A Bedrock model that responds with semantically wrong answers passes a 200 OK health check. A retrieval pipeline that surfaces stale data clears every uptime probe.

Manual log review does not scale across a multi-site network with thirty in-production engagements. Status pages tell you what is on; they do not tell you what is wrong.

Approach

The agent topology

Each Sentinel agent is an Anthropic Managed Agents workload with a discrete surveillance scope and a defined cadence. The agent reads from a constrained set of operational signals — logs, recent error events, model-call traces — runs a Claude inference pass to classify the situation, and decides whether the finding warrants escalation.

Slack as the approval gate

When an agent identifies something worth escalating, it posts a Block Kit card to the appropriate channel with the diagnosis, the recommended remediation, and a small set of action buttons. A human reviewer clicks one. Only then does any remediation fire.

The design rule came directly from an April 2026 incident where an unguarded automated drip in the Retell Phone Bridge sent the same follow-up email 48 times to three leads. Sentinel's discipline since then: automated detection is fine, automated remediation requires a human in the loop.

Workload #1: Diagnostics Agent

The Diagnostics Agent shipped April 2026 as Sentinel's first production workload and the first Anthropic Managed Agents deployment at iSM. It investigates client tenant sites that have failed three consecutive uptime checks — running curl, dig, openssl, and Cloudflare 5xx-breakdown probes — then files a markdown bug-report ticket with timeline, root cause, evidence, and recommended fix. Verified cost: $0.06 per incident on synthetic test cases (Sonnet 4.6, ~50 seconds active runtime, ~28k tokens including prompt cache reuse).

Workload #2: GH Triage Agent

The GH Triage Agent shipped April 2026 as Sentinel's second production workload. It polls iSimplifyMe org repository workflow runs every fifteen minutes, detects failures, and runs an inference pass classifying root cause across eight categories: test_flake, regression, infrastructure, auth, dependency, lint_or_typecheck, build_config, and unknown. Output is a structured ticket on the synthetic internal-isimplifyme tenant with a markdown body covering Failure Summary, Classification, Failed Jobs, Recent Commits, and investigator Notes.

Verified cost: $0.065 per run on synthetic test (Sonnet 4.6, ~44 seconds active runtime). Idempotent — once a failed run is investigated, a 24-hour DDB lock prevents re-investigation, so flapping CI does not produce duplicate tickets.

The two workloads share infrastructure: one generic SQS-triggered runner Lambda dispatches the right agent based on a SENTINEL_AGENT_SLUG kickoff message, an atomic conditional-write lock at INCIDENT#OPEN race-protects parallel detection paths, and the same file_ticket and notify_slack tools serve both. Adding a new Sentinel workload is a registry entry plus a detector handler; everything else is shared.

Status

  • Sentinel layer shipped April 2026 as MA-based monitoring infrastructure on top of the Apex Client Portal stack.
  • Workload #1 (Diagnostics Agent) shipped 2026-04-26; SST migration into the Sentinel namespace shipped 2026-04-29 with race-protected parallel-run lock. Currently observing in canary mode — canary tenant flag flip scheduled 2026-05-03, with a scheduled review agent firing the same day to confirm the rollout against any incidents that fired in the first activation window.
  • Workload #2 (GH Triage Agent) shipped 2026-04-29 to staging and production. Detector and runner Lambdas firing every fifteen minutes; staging synthetic test completed end-to-end (the agent picked up a deliberately-failing workflow on the cc-canary repository, classified it as build_config, recognized from the commit message that it was a synthetic test, filed an exemplary ticket, and posted a precise Slack one-line summary). Currently in a seven-day cc-canary soak through 2026-05-06; broadens to all thirty-seven iSimplifyMe org repositories after the soak verifies clean.
  • Future Sentinel workloads on the roadmap: an Issue/PR triage agent for inbound GitHub issues, a pipeline-hang detector for the content pipeline, a weekly audit agent for cross-repo health rollups, a Lighthouse regression detector for client sites, a cost-anomaly detector for AWS spend spikes, and an AEO drift / citation surveillance / DNS watcher for the brand-citation infrastructure.

Frequently asked

I could not be happier with this company! I have had two websites designed by them and the whole experience was amazing. Their technology and skills are top of the line and their customer service is excellent.
Dr Millicent Rovelo
Beverly Hills
Apex Architecture

Every site we build runs on Apex — sub-500ms, AI-native, zero maintenance.

Explore Apex Architecture

Stay Ahead of the Curve

AI strategies, case studies & industry insights — delivered monthly.

K