Last week was the annual pilgrimage of data nerds everywhere to Strata – Santa Clara. There are now more than one of these events, but this is the original and still the go to for many. The Santa Clara Convention Center is home to so many conferences, they tend to blur a bit for me. Am I at Velocity or Strata. No wait. Cloud Connect? What I find that differentiates it quickly is the sheer volume of friends and colleagues I meet every year.
DataStax sponsored a booth and it turned out to be a great gathering place for conversations and catching up. I mean look. We brought couches!
Not only did we bring couches, we brought a cluster of Cassandra! Ok, it’s all running on Raspberry Pi’s all stacked up, but it’s still pretty cool. With three nodes running we can show off the failure modes of Cassandra but easily removing one… or two. Professor Andy Cobely from University of Dundee did a great talk at our summit last year on using the Raspberry Pi with Cassandra. You aren’t going to break any speed records but it’s a really cheap way of creating cluster of real machines.
One thing that I can say about most conferences, the value isn’t always in the great presentations lined up. I get so much out of the conversations in the hallways. Seeing where other open source projects are. Catching up with friends on their latest gigs. All good stuff. I ran into the guys from the Apache Mesos project. They are doing a lot of great work automating the deployment of Cassandra. As you can read in this tutorial, getting it done isn’t as hard as you would think. More of the great ecosystem forming around Apache Cassandra.
My talk this year was on Time Series with Apache Cassandra. Every year I hear sub-themes around Strata and I would say this year that was the Internet of Things or IoT. Let’s face it. Everything is getting a internet connection today. CES in January showed off new appliances and gadgets for the home. Almost all of the had an internet connection. That’s a lot of devices out there all talking and dumping data into the interwebs. Where is all that going to get stored? More importantly, how do you keep up when your refrigerator is just going on and on about how it’s exactly the same temperature. Always. So my talk was an answer to those questions.
The ability for Cassandra to manage large volumes of data at ridiculous speeds is pretty well proven. What you may not know is why. There is a bit of a strategy for this type of data storage and with Cassandra it’s all in the way the storage engine works. As I outlined I my talk, it’s all about ingesting data into memtables first, merge sorting partitions and then flushing a single sequential write to the file system. This write pattern serves two things. First it is the most efficient way of getting data put to disk. Second is how it sets up that data to be read in a single seek sequential access fashion. Check out my slides for some pretty picture on the topic and further explanation.
My next Strata will be New York in October. This time I’ll be doing a 3 hour tutorial on how to do what I say you should do. (Not like the Bob’s from Office Space) I hope to see you there. And if you see me in the halls, stop me and let’s talk about what you are doing. I would love to hear what you are up to and how I can help.