DevOps/SRE MCP Pack (2025) — Kubernetes, AWS, Prometheus: Triage and Recover Fast

• By RouterMCP Team

Operate services with cluster queries, runbooks, and real-time metrics via MCP. Includes setup and an incident drill.

Incident view connecting K8s logs, Prometheus metrics, and AWS rollback.

DevOps/SRE MCP Pack (2025) — Kubernetes, AWS, Prometheus: Triage and Recover Fast

TL;DR: Ask for failing pods, recent errors, and a rollback in plain language — then execute with approvals.

Servers

  • Kubernetes MCP (community). https://github.com/SedulousSuchcha/mcp-kubernetes
  • AWS MCP (community). https://github.com/isaacwasserman/mcp-aws
  • Prometheus MCP (community). https://github.com/realrasengan/mcp-prometheus

Incident drill

  1. “Show pods with restartCount>3 in checkout ns.”
  2. “Graph 5m error rate for api_gateway.”
  3. “Roll back to previous task definition; confirm before apply.”

Internal links

  • Pack docs: /packs/devops-sre
  • Related posts: Observability (10), Security (01)

FAQ Q: How do we prevent destructive commands?
A: Require confirmations and role‑based policies per tool.

Schema

Checklist (fast)

  1. Intent. 2) Title/meta. 3) Slug. 4) TL;DR. 5) Drill. 6) FAQ. 7) Links. 8) Images/alt. 9) Edit. 10) CTA.

CTA

  • Use the template: examples/packs/devops-sre.mcp.json and the “safe ops” policy examples + limiter configs.