Running Cassandra on Kubernetes: The K8ssandra Story

This article is based on my appearance on the Kubernetes Bytes podcast, where I discussed the journey of bringing Apache Cassandra to Kubernetes through the K8ssandra project.

When we started the K8ssandra project, the question wasn’t whether Cassandra could run on Kubernetes - it was whether it could run well enough for production workloads. Three years later, with major enterprises running multi-petabyte Cassandra clusters on K8s, I think we have our answer.

The Challenge: Stateful Meets Cloud Native

Cassandra was designed for a different era - one where servers were pets, not cattle. Its architecture assumes:

Stable network identities
Persistent local storage
Careful orchestration of cluster topology changes

Kubernetes, meanwhile, was built for stateless workloads where instances can be created and destroyed at will. Bridging this gap required rethinking fundamental assumptions about both technologies.

Enter K8ssandra

K8ssandra isn’t just Cassandra running in a container. It’s a complete cloud-native distribution that includes:

Cass Operator

The heart of K8ssandra is the Cass Operator, which encodes years of operational knowledge into Kubernetes controllers. It handles:

Cluster initialization and scaling
Rolling upgrades with zero downtime
Rack awareness and multi-datacenter deployments
Integration with Kubernetes storage and networking

Operational Tools

Running a database requires more than just the database itself:

Reaper for automated repairs
Medusa for backup and restore
Stargate for API access
Metrics collectors for monitoring

Multi-Cluster Management

The K8ssandra Operator manages Cassandra deployments across multiple Kubernetes clusters, enabling true multi-region deployments with a single control plane.

Real-World Lessons

Storage Strategy is Everything

The biggest lesson from production deployments: storage strategy makes or breaks your performance.

apiVersion: v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iops: "16000"
  throughput: "1000"

For latency-sensitive workloads, local SSDs are still king. But for most use cases, properly configured network storage provides the flexibility needed for cloud native operations.

Node Affinity and Anti-Affinity

Cassandra’s replica placement strategy needs to be coordinated with Kubernetes scheduling:

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: cassandra.datastax.com/datacenter
            operator: In
            values:
            - dc1
        topologyKey: kubernetes.io/hostname

This ensures replicas are distributed across different nodes, maintaining Cassandra’s fault tolerance guarantees.

Resource Management

Cassandra’s resource requirements are predictable but substantial:

Memory: Plan for at least 8GB heap in production
CPU: Cassandra is surprisingly CPU-efficient, but compaction can spike usage
Network: Replication and repair generate significant inter-node traffic

Multi-Datacenter Magic

One of K8ssandra’s most powerful features is seamless multi-datacenter deployments across Kubernetes clusters:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: demo
spec:
  cassandra:
    datacenters:
      - metadata:
          name: dc1
        k8sContext: us-east-1
        size: 3
      - metadata:
          name: dc2  
        k8sContext: us-west-2
        size: 3

This single manifest creates a globally distributed Cassandra cluster, with each datacenter running in a separate Kubernetes cluster.

Operational Benefits

Automated Operations

K8ssandra automates many operational tasks that traditionally required manual intervention:

Scaling: Add or remove nodes with a simple kubectl command
Upgrades: Rolling upgrades that respect Cassandra’s operational requirements
Repairs: Automated anti-entropy repairs using Reaper
Backups: Scheduled backups to cloud storage with Medusa

Observability

Integration with cloud native observability tools provides insights that were difficult to achieve with traditional deployments:

Prometheus metrics for real-time monitoring
Jaeger tracing for request flow analysis
Grafana dashboards for operational visibility

Infrastructure as Code

The entire cluster configuration is declarative, enabling GitOps workflows and infrastructure automation.

When to Choose K8ssandra

K8ssandra shines in several scenarios:

Multi-Cloud Deployments

Organizations that need to run Cassandra across multiple cloud providers benefit from K8ssandra’s unified operational model.

DevOps-First Organizations

Teams already invested in Kubernetes and cloud native tooling can leverage existing expertise and infrastructure.

Rapid Scaling Requirements

The ability to scale clusters quickly in response to demand makes K8ssandra ideal for variable workloads.

Challenges and Considerations

Learning Curve

Running Cassandra on Kubernetes requires understanding both technologies deeply. The abstraction can hide important details that affect performance.

Resource Overhead

Kubernetes introduces some overhead compared to bare metal deployments. For extremely high-performance use cases, this may be a consideration.

Networking Complexity

Service meshes and network policies can introduce latency and complexity that affects Cassandra’s performance characteristics.

The Future of Data on Kubernetes

K8ssandra represents a broader trend: the convergence of cloud native and data technologies. As Kubernetes matures, we’re seeing:

More sophisticated operators for complex stateful workloads
Better storage abstractions that hide infrastructure complexity
Improved networking that reduces the overhead of container networking

Getting Started

The barrier to entry for K8ssandra has never been lower:

helm repo add k8ssandra https://helm.k8ssandra.io/stable
helm install k8ssandra k8ssandra/k8ssandra

This single command deploys a complete Cassandra environment with monitoring, backup, and repair capabilities.

Conclusion

Three years ago, running Cassandra on Kubernetes was an experiment. Today, it’s a production-ready approach that offers significant operational advantages over traditional deployments.

K8ssandra proves that cloud native and databases aren’t mutually exclusive. With the right abstractions and operational tooling, you can have the scale and reliability of Cassandra with the operational simplicity of Kubernetes.

The future of data is cloud native, and K8ssandra is leading the way.

Back to Articles