We’ve been working hard over the last few weeks to bring you Katie and Katie MCP, a pair of tools to help you manage and integrate your Kubernetes clusters into AI.
Using the easily-installed cluster agent and MCP tooling, you can integrate live data about your Kubernetes clusters into AI decision-making and actions.
Katie can be installed as a read-write agent, capable of making changes to your cluster, or as a read-only element, able only to provide up-to-the-second data to AI language models.
Katie Agent and MCP
Installing the agent into your cluster via our Helm chart couldn’t be easier – only two parameters are required: the name of the cluster and a FusionReactor Cloud API Key. Once entered, the agent—a single pod—will quietly sit and answer requests for you via FusionReactor Cloud.
The MCP (a Model Context Protocol server) can be installed here into any MCP client (there are many available) and once configured, can query and action your Kubernetes clusters. The Katie MCP can be used by clients as a single source in a bigger palette, allowing the AI to tie together data from many disparate sources.
Here’s a few examples, taken from our demo cluster. The Katie Agent in this cluster has full read-write access. The AI Agent is Claude Desktop, with the Sonnet 4 model.
Health and Troubleshooting
First let’s get an overview of the cluster.
> Prompt: Summarize the health of the demo cluster.
Demo Cluster Health Summary:
Nodes: 3 nodes (2 Ready, 1 Ready but SchedulingDisabled)
Pods: All 49 pods Running (100% healthy)
- System pods: 14/14 healthy
- Monitoring: 10/10 healthy
- Demo app (otel-demo): 25/25 healthy
Key Status: Cluster is healthy, one node has scheduling disabled but system operational.
All pods are up – but it looks like one node is cordoned. Let’s free it.
> Prompt: Uncordon that cordoned node.
Node ip-172-20-4-1.eu-west-1.compute.internal uncordoned successfully.
Let’s check whether any nodes are under pressure.
> Prompt: Are any nodes under pressure?
Node Pressure Summary:
All nodes: No pressure conditions (MemoryPressure, DiskPressure, PIDPressure all False)
Resource utilization:
- Node 1 (ip-172-20-10-123): CPU 18%, Memory 36% – healthy
- Node 2 (ip-172-20-4-1): CPU 95%, Memory 75% – high CPU usage
- Node 3 (ip-172-20-5-100): CPU 7%, Memory 80% – high memory usage
Node 2 has high CPU utilization (95%) but no pressure alerts yet.
I wonder what’s using all that memory?
> Prompt: What’s using the memory on that high pressure node?
The quote pod on node 3 is using 6GB of the 7GB available memory (80% allocation). This single pod is consuming most of the node’s memory capacity.
Quote pod details:
- Memory request/limit: 6GB
- Only the pod is consuming significant memory on that node
- Running for 77 minutes
This appears to be a memory-intensive workload in the OpenTelemetry demo application.
The quote service is using a lot of its allocation—I think this is the docker image that provides load, and it probably should have high memory demand. Let’s check.
> Prompt: What’s the image for that pod?
Quote pod image: ghcr.io/intergral/opentelemetry-cf-demo/quote-cf:pr-18
This is a ColdFusion-based quote service (part of OpenTelemetry demo) with:
- 6GB memory request/limit (explains the high memory usage)
- Java-based runtime with ColdFusion/Lucee
- Environment shows it’s running with Fusion Reactor Cloud license
The high memory allocation appears intentional for this demo workload.
Yep, that’s the load pod. Can’t do anything about that. Let’s check the logs.
> Prompt: Check the logs for the quote service.
Key Findings:
- Service started successfully in 7.3 seconds
- No errors or memory issues in logs
- FusionReactor cloud monitoring enabled
- Service running on port 8888
Recommendation: Reduce pod memory limit to 3GB (allowing overhead for non-heap memory) – would free up 3GB on the node.
More Complex Actions and Queries
We can use Katie and Katie MCP to action our cluster too, since the Agent has read-write access. Here’s a couple of demos of what you can do.
> Prompt: Scale coredns from 2 to 3 pods, and check they’re running.
CoreDNS scaling complete:
✅ Scaled from 2 to 3 pods
- coredns-5b7cdbc9-cktr4 (existing)
- coredns-5b7cdbc9-lmztv (existing)
- coredns-5b7cdbc9-h89c8 (new – 5s old)
Status: All 3 CoreDNS pods Running and Ready (1/1).
> Prompt: Restart the checkout deployment.
Checkout deployment restarted successfully:
✅ New pod: checkout-5f45986d74-vvcqv (5 seconds old)
✅ Status: Running and Ready (1/1)
The old checkout pod was terminated and replaced with a fresh instance.
> Prompt: Do all services have at least one pod assigned? Are any deployments under-replicated?
All deployments have pods assigned: ✅
Replica status:
- Target replicas: All deployments show 1/1 (Ready/Up-to-date/Available)
- Actual pods: All 24 services have exactly 1 pod running
No under-replicated deployments: ✅
Pod count by service:
- 23 single-pod deployments (1/1 each)
- 1 StatefulSet: opensearch-0 (also 1 pod)
Total: 24 services, 24 pods – all fully replicated to their target of 1 replica each.
The cluster is properly balanced with no missing or under-replicated services.
> Prompt: List the top pods in the namespace, together with their resource requests and usages as a percentage of the available node capacity. Order by percentage node memory used.
Top 5 Pods by Memory Usage:
- quote-57b64d9779-bx7cn
- Memory Used: 4.2GB
- Memory Limit: 5.6GB
- Node Memory: 60%
- CPU Used: 1943m
- opensearch-0
- Memory Used: 952MB
- Memory Limit: 1.1GB
- Node Memory: 13%
- CPU Used: 6m
Summary
Katie and Katie MCP work in concert to provide AI models with up-to-the-second data to make fast, accurate decisions and precise inferences. Models can then leverage this data to give you insights and useful strategies. Many actions can even be implemented immediately by Katie directly.
Katie is a great way to use natural language commands to effectively monitor and administer Kubernetes environments.