Scaling Apache Cassandra on Kubernetes: Lessons from the Field

After years of helping organizations run Apache Cassandra in production, I’ve seen the landscape shift dramatically toward Kubernetes. While K8s was originally designed for stateless workloads, the community has made tremendous progress in supporting databases and other stateful applications.

The Challenge of Stateful Sets

When Kubernetes first introduced StatefulSets, it was a game-changer for running databases. However, Cassandra’s unique architecture presents some interesting challenges:

Network Identity: Cassandra nodes need stable network identities to form and maintain clusters
Persistent Storage: Data must survive pod restarts and rescheduling
Ordered Deployment: While Cassandra is masterless, careful orchestration during scaling operations is crucial

Key Lessons Learned

1. Storage Performance Matters

The most common performance bottleneck I see is inadequate storage. Cassandra is designed for fast, local SSDs. When running on Kubernetes:

Use local SSDs when possible (though this limits pod mobility)
For cloud deployments, provision high-IOPS volumes
Monitor disk latency religiously - it’s often the canary in the coal mine

2. Anti-Affinity is Your Friend

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: cassandra
        topologyKey: kubernetes.io/hostname

This ensures Cassandra pods are distributed across different nodes, improving fault tolerance.

3. Resource Limits Need Careful Tuning

Cassandra’s JVM needs careful tuning. Setting resource limits too low can cause constant GC pressure, while setting them too high wastes resources:

Start with 8GB heap for production workloads
Use G1GC for heaps larger than 8GB
Monitor GC pause times and adjust accordingly

4. Operators Simplify Operations

The Cassandra community has developed several operators that handle the complexity of running Cassandra on K8s:

K8ssandra: Provides a complete stack including monitoring and repairs
Cass Operator: DataStax’s production-ready operator
Cassandra Operator: The original community operator

The Path Forward

As we move toward a cloud-native future, the intersection of Cassandra and Kubernetes will only become more important. The recent work on vector search capabilities in Cassandra 5.0 opens up exciting possibilities for AI workloads on K8s.

The key is to start small, measure everything, and gradually scale up. The complexity is manageable, and the benefits of unified infrastructure are compelling.

Conclusion

Running Cassandra on Kubernetes is no longer an experiment - it’s production-ready for organizations willing to invest in understanding both technologies. The ecosystem has matured significantly, and with the right approach, you can achieve the scalability and reliability Cassandra is known for, with the operational benefits of Kubernetes.

Remember: distributed systems are hard, but that’s what makes them interesting. Happy scaling!

Back to Articles