We’ve been working hard over the last few weeks to bring you Katie and Katie MCP, a pair of tools to help you manage and integrate your Kubernetes clusters into AI.

Using the easily-installed cluster agent and MCP tooling, you can integrate live data about your Kubernetes clusters into AI decision-making and actions.

Katie can be installed as a read-write agent, capable of making changes to your cluster, or as a read-only element, able only to provide up-to-the-second data to AI language models.

Katie Agent and MCP

Installing the agent into your cluster via our Helm chart couldn’t be easier – only two parameters are required: the name of the cluster and a FusionReactor Cloud API Key. Once entered, the agent—a single pod—will quietly sit and answer requests for you via FusionReactor Cloud.

The MCP (a Model Context Protocol server) can be installed here into any MCP client (there are many available) and once configured, can query and action your Kubernetes clusters. The Katie MCP can be used by clients as a single source in a bigger palette, allowing the AI to tie together data from many disparate sources.

Here’s a few examples, taken from our demo cluster. The Katie Agent in this cluster has full read-write access. The AI Agent is Claude Desktop, with the Sonnet 4 model.

Health and Troubleshooting

First let’s get an overview of the cluster.

> Prompt: Summarize the health of the demo cluster.

Demo Cluster Health Summary:

Nodes: 3 nodes (2 Ready, 1 Ready but SchedulingDisabled)

Pods: All 49 pods Running (100% healthy)

System pods: 14/14 healthy
Monitoring: 10/10 healthy
Demo app (otel-demo): 25/25 healthy

Key Status: Cluster is healthy, one node has scheduling disabled but system operational.

All pods are up – but it looks like one node is cordoned. Let’s free it.

> Prompt: Uncordon that cordoned node.

Node ip-172-20-4-1.eu-west-1.compute.internal uncordoned successfully.

Let’s check whether any nodes are under pressure.

> Prompt: Are any nodes under pressure?

Node Pressure Summary:

All nodes: No pressure conditions (MemoryPressure, DiskPressure, PIDPressure all False)

Resource utilization:

Node 1 (ip-172-20-10-123): CPU 18%, Memory 36% – healthy
Node 2 (ip-172-20-4-1): CPU 95%, Memory 75% – high CPU usage
Node 3 (ip-172-20-5-100): CPU 7%, Memory 80% – high memory usage

Node 2 has high CPU utilization (95%) but no pressure alerts yet.

I wonder what’s using all that memory?

> Prompt: What’s using the memory on that high pressure node?

The quote pod on node 3 is using 6GB of the 7GB available memory (80% allocation). This single pod is consuming most of the node’s memory capacity.

Quote pod details:

Memory request/limit: 6GB
Only the pod is consuming significant memory on that node
Running for 77 minutes

This appears to be a memory-intensive workload in the OpenTelemetry demo application.

The quote service is using a lot of its allocation—I think this is the docker image that provides load, and it probably should have high memory demand. Let’s check.

> Prompt: What’s the image for that pod?

Quote pod image: ghcr.io/intergral/opentelemetry-cf-demo/quote-cf:pr-18

This is a ColdFusion-based quote service (part of OpenTelemetry demo) with:

6GB memory request/limit (explains the high memory usage)
Java-based runtime with ColdFusion/Lucee
Environment shows it’s running with Fusion Reactor Cloud license

The high memory allocation appears intentional for this demo workload.

Yep, that’s the load pod. Can’t do anything about that. Let’s check the logs.

> Prompt: Check the logs for the quote service.

Key Findings:

Service started successfully in 7.3 seconds
No errors or memory issues in logs
FusionReactor cloud monitoring enabled
Service running on port 8888

Recommendation: Reduce pod memory limit to 3GB (allowing overhead for non-heap memory) – would free up 3GB on the node.

More Complex Actions and Queries

We can use Katie and Katie MCP to action our cluster too, since the Agent has read-write access. Here’s a couple of demos of what you can do.

> Prompt: Scale coredns from 2 to 3 pods, and check they’re running.

CoreDNS scaling complete:

✅ Scaled from 2 to 3 pods

coredns-5b7cdbc9-cktr4 (existing)
coredns-5b7cdbc9-lmztv (existing)
coredns-5b7cdbc9-h89c8 (new – 5s old)

Status: All 3 CoreDNS pods Running and Ready (1/1).

> Prompt: Restart the checkout deployment.

Checkout deployment restarted successfully:

✅ New pod: checkout-5f45986d74-vvcqv (5 seconds old)

✅ Status: Running and Ready (1/1)

The old checkout pod was terminated and replaced with a fresh instance.

> Prompt: Do all services have at least one pod assigned? Are any deployments under-replicated?

All deployments have pods assigned: ✅

Replica status:

Target replicas: All deployments show 1/1 (Ready/Up-to-date/Available)
Actual pods: All 24 services have exactly 1 pod running

No under-replicated deployments: ✅

Pod count by service:

23 single-pod deployments (1/1 each)
1 StatefulSet: opensearch-0 (also 1 pod)

Total: 24 services, 24 pods – all fully replicated to their target of 1 replica each.

The cluster is properly balanced with no missing or under-replicated services.

> Prompt: List the top pods in the namespace, together with their resource requests and usages as a percentage of the available node capacity. Order by percentage node memory used.

Top 5 Pods by Memory Usage:

quote-57b64d9779-bx7cn
- Memory Used: 4.2GB
- Memory Limit: 5.6GB
- Node Memory: 60%
- CPU Used: 1943m
opensearch-0
- Memory Used: 952MB
- Memory Limit: 1.1GB
- Node Memory: 13%
- CPU Used: 6m

Summary

Katie and Katie MCP work in concert to provide AI models with up-to-the-second data to make fast, accurate decisions and precise inferences. Models can then leverage this data to give you insights and useful strategies. Many actions can even be implemented immediately by Katie directly.

Katie is a great way to use natural language commands to effectively monitor and administer Kubernetes environments.

APM

Capabilities

AI

Logs

Infrastructure

APM

Capabilities

AI

Logs

Infrastructure

Installation

Configure

Troubleshoot

Blog / Info

Customers

About Us

Installation

Downloads

Quick Start for Java

Observability Agent

Ingesting Logs

System Requirements

Configure

On-Premise Quickstart

Cloud Quickstart

Application Naming

Tagging Metrics

Building Dashboards

Setting up Alerts

Troubleshoot

Performance Issues

Stability / Crashes

Debugging

Blog / Info

Videos / Webinars

Customers

Video Reviews

Reviews

Success Stories

About Us

Company

Careers

Contact

Contact support

Use Cases

Industries

Technologies