DevOps/SRE MCP Pack (2025) — Kubernetes, AWS, Prometheus: Triage and Recover Fast
• By RouterMCP Team
Operate services with cluster queries, runbooks, and real-time metrics via MCP. Includes setup and an incident drill.

DevOps/SRE MCP Pack (2025) — Kubernetes, AWS, Prometheus: Triage and Recover Fast
TL;DR: Ask for failing pods, recent errors, and a rollback in plain language — then execute with approvals.
Servers
- Kubernetes MCP (community). https://github.com/SedulousSuchcha/mcp-kubernetes
- AWS MCP (community). https://github.com/isaacwasserman/mcp-aws
- Prometheus MCP (community). https://github.com/realrasengan/mcp-prometheus
Incident drill
- “Show pods with restartCount>3 in checkout ns.”
- “Graph 5m error rate for api_gateway.”
- “Roll back to previous task definition; confirm before apply.”
Internal links
- Pack docs: /packs/devops-sre
- Related posts: Observability (10), Security (01)
FAQ
Q: How do we prevent destructive commands?
A: Require confirmations and role‑based policies per tool.
Schema
Checklist (fast)
- Intent. 2) Title/meta. 3) Slug. 4) TL;DR. 5) Drill. 6) FAQ. 7) Links. 8) Images/alt. 9) Edit. 10) CTA.
CTA
- Use the template:
examples/packs/devops-sre.mcp.jsonand the “safe ops” policy examples + limiter configs.