Patrick McFadin

  • Home
  • About Me

From tweets to code: Community feedback in action

24th June, 2014 · Patrick McFadin

On one of my daily excursions into the data stream of twitter, I noticed I was included in an exchange with a Apache Cassandra user. He was venting his frustration about how cqlsh displayed tables after creation. When performing a ‘DESC table <table_name>’ he found it unclear how the underlying structure was arranged. To be specific, which part of the primary key was the partition key.

 

Screen Shot 2014 05 30 at 10 06 10 PM

 

To this I replied

 

Screen Shot 2014 05 30 at 10 18 25 PM

And his reply almost immediately was this

Screen Shot 2014 05 30 at 10 19 50 PM

 

Ok then. Here is the link to the Jira: https://issues.apache.org/jira/browse/CASSANDRA-7274 Game on! I love it.

There was some discussion in the Jira about any objections. After a couple days to let people comment, I thought I would take a stab at making the change. I’ve written a lot of Java code in my day and it didn’t seem too daunting. I created a local branch of the 2.0.x code and dug in. As I started looking at how cqlsh actually displays the table string on a DESC command, I slowly realized this is all implemented in Python! I haven’t written a lot of Python code, but I know enough to be dangerous. It’s not like I’m flying a 747 for the first time. The worst thing that could happen is that I would be publicly shamed with my lame code. Ok. So worse.

You know the old saying: If at first you don’t succeed, erase any evidence you tried! With git reset on my side, I moved on. It didn’t take long, and I found the section of code responsible for adding parenthesis around partition keys when the number of keys is larger than one. Didn’t look difficult. With a couple of quick changes to the code, I think I had it. Using my new version of cqlsh, I created a few test tables and verified the correct output. Success!

Now it was just a matter of submitting the patch and waiting for the review. Two git commands create the patch for me:

git add bin/cqlsh

git diff –cached > 7274.txt

Now I have a patch file ready to attach to the original Jira. After you attach the file, you need to flag the Jira as ‘patch ready’ and a reviewer will come along and check it out. In my case, Aleksey Yeschenko reviewed and offered a further optimization. I made the change and re-uploaded the new patch. The next day Aleksey gave it a thumbs up and committed my change! (Proof: https://github.com/apache/cassandra/commit/6faf80c9d267ede53c139b5f2a39e8e56ee80b2a#diff-6469b081699ab92c53e0513a499ca5eb)  That wasn’t too bad and even better, that simple suggestion is now a feature based on real user feedback.

I don’t always have the time to do this sort of thing, but I’m really glad I made the time. It was simple enough and got finished quickly. Instead of waiting on one of the core developers to pick it up sometime, I decided to do it myself. The result was just a little better user experience. I encourage anyone to try this and see how you can make things a little better. If I look at some products that call themselves open source, i feel like it’s a half truth when users can’t do this. Having an open Jira on Apache gives a lot of power to the community. Make sure you check it out sometime.

 

Posted in Planet Cassandra | Tags: cassandra, community |

15 Commandments of Cassandra Admin

24th February, 2014 · Patrick McFadin

Recently, I did a webinar on the 15 Commandments of Cassandra Admin. I can’t claim responsibility for all of these. I worked with Rachel Padreschi (@RachelPadreschi) on this presentation. Unfortunately, she wasn’t able to make it to the live broadcast, but I want to make sure she gets all the credit she deserves. We have seen quite a few installations in the wild and, like anything you do over and over, you notice you are repeating yourself. It began with 10 commandments, but soon grew to 15. Luckily, we stopped ourselves there. Maybe I’ll revise these someday but, as of February 2014, these look pretty good.

So what about these? If you are using or running Cassandra, what we are saying is pay attention to this list. If I saw you hitting yourself in the head with a hammer, I would really want to take it away. You then might say “Wow dude. Thanks for taking away that hammer. I feel so much better” That makes me happy. I know not everyone is straight admin, but trust me, they are related to administering a running cluster.

My plan in this blog is to introduce the commandments and then follow up with an in-depth article on each. That’s going to be a lot of blog posts I just created for myself. What am I thinking? Oh I know! I’m thinking you might be able to use these. Here is the list, and don’t hesitate to ask questions. Feedback is always appreciated.

Commandment 1 – Great data models start with the queries

Cassandra is not a relational system and, because of that, doesn’t do joins. Ingesting data into a data model is really with a pre-conception of just what will be asked of that data. Unlike relational modeling which is Data->Model->Queries, we reverse things and go Queries->Model->Data

Commandment 2 – It’s ok to duplicate data

If you have many questions of the same data, you might need to build different views. This will result in duplicate data, but no worries! Speed isn’t going to be a problem. Volume isn’t going to be a problem. Go ahead and duplicate away.

Commandment 3 – Disk IO is your first problem

Almost every major problem I have had to diagnose on a Cassandra cluster has come down to storage. Misconfigured, under powered or just plain wrong. Cassandra disk IO patterns are very different than other databases. Good thing to understand.

Commandment 4 – Secondary Indexes are for convenience not speed

In relational databases we add indexes (mostly) for speed. Secondary indexes in Cassandra are used to find data stored in columns on various rows. This results in a distributed query that, in some cases, can be much slower than creating our own index tables. There are good uses for them but just understand the tradeoffs and alternatives. (See Commandment 5 for an alternative)

Commandment 5 – Embrace large partitions and de-normalization

The storage model of Cassandra lends itself to really fast access of large partitions. Understanding how and why those work can lead to some very fast and server efficient data models. If you are looking for speed and found yourself at Commandment 4, this is the way to do it right.

Commandment 6 – Don’t be afraid to add nodes

Too many times I see people trying to vertically scale Cassandra. There is some call for that when trying to achieve data density, but the ultimate scaling story is in the horizontal. More nodes! This really makes capacity planning much easier and when the time comes to add capacity, don’t be afraid!

Commandment 7 – Mind your compactions

Compaction is the background process of combining SSTables in Cassandra. Completely normal operation. It serves to merge/sort row data in multiple files, but also clean out old data, It is the most impactful IO event Cassandra can create and not managed or considered, old files can build up. To keep things smooth and efficient, mind those compactions!

Commandment 8 – Never use shared storage

Yeah I said that. Just don’t. There isn’t yet a shared storage system that can bring the latency, and throughput, that local disk can deliver. Not to mention, if you are using a distributed system, why create a single point of failure in a single shared storage system? Spread the risk!

Commandment 9 – Understand cache. Disk, Key and Row

Cassandra cache is often misunderstood, To better understand how it all relates, you really need to know some internals. Each cache layer is separately important and there can even be a downside if improperly used.

Commandment 10 – Always use JNA

JNA is the system call library external to the running JVM. We use it extensively with Cassandra to enable things like off-heap storage. This opens up a lot of efficient memory access patterns that you are really going to miss if not present. Miss, as in you’ll be crying like a baby, and wish you had them. Just do it.

Commandment 11 – Learn how to load data. Bulk Load. Insert. Copy

Chances are you will need to load a lot of data into a running Cassandra cluster at some point. There are different methods based on the volume and how much control you need. Copy is good for fixed columns up to around a million rows. Insert is good for controlling any conversions of source data and can scale to any size. Bulk Load is for when you need to get a million plus rows in your cluster fast.

Commandment 12 – Repair isn’t just for broken data

Way back in the original Dynamo paper, it was known that bad things happen to good data. Consistency can vary over large clusters just through entropy. You can check and correct that consistency with the repair command in Cassandra. Also, known as anti-entropy repair, its a good thing to run all the time.

Commandment 13 – Know the relationship between Consistency Level and Replication Factor

Two things that go hand in hand, skipping down the trail of your data model. Replication Factor is how many copies of your data exist per keyspace and data center. Consistency Level is set by the client per read and write, and specifies how many replicas acknowledge or respond to the request. Knowing how these work together is critical for good performance and uptime.

Commandment 14 – More than 8G heap doesn’t mean better performance

Cassandra is written in Java so, as a result, knowing a bit about the JVM when administering isn’t a bad thing. There is a dark art to working with the JVM heap and the various settings. One common mistake is to just give the JVM more memory, in hopes it will increase the performance by giving it some sort of headroom. Unfortunately, this can backfire and result in painfully long garbage collection pauses. Keep it at or below 8G and be happy.

Commandment 15 – Get involved in the community

Ok, so not a real rule but something that will definitely improve the administration of your cluster. How? There are so many good resources for Cassandra out there and one of the best is the other people doing it with you. Be part of the community and learn from each other. Want to know the best way to model certain kinds of data? Ask. Found some great tuning parameters for a particular data access pattern? Tell the world in a tweet, blog or even better, community webinar! It’s how I got started and hopefully why you stay.

Posted in Planet Cassandra, Uncategorized | Tags: admin, cassandra |

Strata – Santa Clara 2014 Wrap Up

17th February, 2014 · Patrick McFadin

Last week was the annual pilgrimage of data nerds everywhere to Strata – Santa Clara. There are now more than one of these events, but this is the original and still the go to for many. The Santa Clara Convention Center is home to so many conferences, they tend to blur a bit for me. Am I at Velocity or Strata. No wait. Cloud Connect? What I find that differentiates it quickly is the sheer volume of friends and colleagues I meet every year.

DataStax Strata Booth

We brought couches. Come by and let’s talk!

DataStax sponsored a booth and it turned out to be a great gathering place for conversations and catching up. I mean look. We brought couches!

Raspberry Pi Cassandra Cluster

Raspberry Pi Cassandra Cluster

Not only did we bring couches, we brought a cluster of Cassandra! Ok, it’s all running on Raspberry Pi’s all stacked up, but it’s still pretty cool. With three nodes running we can show off the failure modes of Cassandra but easily removing one… or two.  Professor Andy Cobely from University of Dundee did a great talk at our summit last year on using the Raspberry Pi with Cassandra. You aren’t going to break any speed records but it’s a really cheap way of creating cluster of real machines.

One thing that I can say about most conferences, the value isn’t always in the great presentations lined up. I get so much out of the conversations in the hallways. Seeing where other open source projects are. Catching up with friends on their latest gigs. All good stuff. I ran into the guys from the Apache Mesos project. They are doing a lot of great work automating the deployment of Cassandra. As you can read in this tutorial, getting it done isn’t as hard as you would think. More of the great ecosystem forming around Apache Cassandra.

 

My talk this year was on Time Series with Apache Cassandra. Every year I hear sub-themes around Strata and I would say this year that was the Internet of Things or IoT. Let’s face it. Everything is getting a internet connection today. CES in January showed off new appliances and gadgets for the home. Almost all of the had an internet connection. That’s a lot of devices out there all talking and dumping data into the interwebs. Where is all that going to get stored? More importantly, how do you keep up when your refrigerator is just going on and on about how it’s exactly the same temperature. Always. So my talk was an answer to those questions.

Time Series with Apache Cassandra

The ability for Cassandra to manage large volumes of data at ridiculous speeds is pretty well proven. What you may not know is why. There is a bit of a strategy for this type of data storage and with Cassandra it’s all in the way the storage engine works. As I outlined I my talk, it’s all about ingesting data into memtables first, merge sorting partitions and then flushing a single sequential write to the file system. This write pattern serves two things. First it is the most efficient way of getting data put to disk. Second is how it sets up that data to be read in a single seek sequential access fashion. Check out my slides for some pretty picture on the topic and further explanation.

My next Strata will be New York in October. This time I’ll be doing a 3 hour tutorial on how to do what I say you should do. (Not like the Bob’s from Office Space) I hope to see you there. And if you see me in the halls, stop me and let’s talk about what you are doing. I would love to hear what you are up to and how I can help.

 

Storm Trooper at Strata

These are not the data nerds you are looking for

 

Posted in Uncategorized | Tags: cassandra, planet cassandra |

MongoDB. This is not the database you are looking for.

11th February, 2014 · Patrick McFadin

Many of you that know me, know that I rarely get negative about your data store choice. If you’ve done your diligence and it works for you, great. We live in an age of an over-abundance of choices so something should fit your use case. Yes, I would prefer you use Apache Cassandra. Not because I’m an evangelist, but I’ve personally done great things with this technology and I believe in it. When you say you will be going a different direction, I’ll wish you luck and go on my way. You won’t see me flipping out waving my hands…


Until today.


You know when you see something repeatedly and you can’t help but think there is more to this? I’m having one of those moments. I’ve been doing full time consulting for Cassandra for probably a year and a half at this point. In the past year there has been a rising chorus of users stuck on a cliff with MongoDB and are desperate to get out. I hear some really tough stories about how it seemed to be a great fit when they started, only to find out it wasn’t matching the scale they needed. They were told it was “Web Scale” but to paraphrase  Inigo Montoya from Princess Bride “This word Web Scale, it does not mean what you think it means” There have been some pretty public stories about this growing problem. The team over at Shift talked about the need for migration away from MongoDB and onto Cassandra is a recent interview. Many of us have done interesting tricks to make relational databases scale, they found that they were having to do the same stuff with MongoDB. I love this quote from John Haddad, senior architect at Shift.


“Cassandra is much more sane to deal with than MongoDB. MongoDB just has more moving parts architecturally, and pulling our data simply ground it to a halt. With Cassandra, it’s insanely fast, and managing the data is a no-brainer for us.”


When I ran infrastructure, I wanted the no-brainer. I say this all the time: the database should be the most boring thing in your datacenter. Is it scaling? Yep. Is it online? Yep. Boring.


They aren’t the only vocal ones either. On planetcassandra.org there are a lot of interviews just like it. Internet of Things search engine Shodan limited-out MongoDB and had to move or just stop collecting data. (Not an option.) Analytics provider Retailigence was driving off the same cliff. When you build an application to make money, you really don’t want something in the way. It’s like having a silent thief in your store stealing away your profits. If you have investors, that’s going to be a hard meeting when you say cash flow is down because of scaling problems.


The scaling and uptime of Cassandra are well known. We see it proven time and time again. Cassandra was born of the Dynamo paper from Amazon.com. A project started to store shopping cart data with just those requirements. They make an average of $1000 a second and if you know stats, that is masking what happens during the holiday season. They put the A-list engineers on that problem and the end result was evolutionary computer science. Not revolutionary. Evolutionary. If your money pipe is on the line, don’t mess around and come up with something fancy and completely new. This, in part, is why we see engineers using Cassandra in production. It’s the evolved state of data storage for the next generation of web applications. And you don’t need to be a startup to see it. Many top tier companies are using Cassandra every day to make money. This quote from Christos Kalazantis at Netflix sums it up.


“We considered Mongo. We considered Riak. We considered all these other databases. But the architecture of Cassandra and its availability, consistency tuning and scalability made it a clear choice.”


So why are teams going down this path in the first place? Sadly, it’s something I would have conceded to MongoDB not too long ago. Ease of use for developers. When you have to get your application out the door, sometimes speed of development is the primary motivator. If you’re told it will scale, you check the box and get to building your application. A lot has changed over the past couple of years with Cassandra and I don’t think it’s true any more. CQL (Cassandra Query Language) has brought a very familiar syntax to the development story. “select * from users” Hey I get that! I talk to developers every day and I hear how they were almost instantly productive and writing applications. There isn’t a huge up-front payment with Cassandra anymore in learning how to get data in and out.


Ok. I can’t rant any more. I just want to help you if you are seeing that cliff. We don’t need anymore “I had to migrate off MongoDB” stories on Planet Cassandra. There are plenty. What can you do to get going? We have a lot of people creating amazing resources to help on your scaling story and getting it right the first time. If you are brand new, head over to the Getting Started page at Planet Cassandra. We have the quick path to getting there quickly lined up. Docs. Data Modeling. VM to try out. It’s all there. How about your use case? There are examples to look at and tons of 5 minute interviews. When you have questions don’t hesitate to ask! StackOverflow. Mailing List. IRC. If you want to use twitter, just add a #cassandra and we’ll find you.


When determining your database for your next project, the choice is yours. If what I’m talking about here matters to you, then consider what I’m saying. If you chose another database, we’ll still be friends. If you choose MongoDB because it scales? Chances are, we’ll talk again.


Posted in Planet Cassandra | Tags: cassandra, mongodb, performance, scaling |

Getting started with Cassandra time series data modeling

5th February, 2014 · Patrick McFadin

Cassandra is awesome at time series

Cassandra’s data model works well with data in a sequence. That data can be variable in size, and Cassandra handles large amounts of data excellently. When writing data to Cassandra, data is sorted and written sequentially to disk. When retrieving data by row key and then by range, you get a fast and efficient access pattern, due to minimal disk seeks. Time series data is an excellent fit for this type of pattern. For these examples, we’ll use a weather station that is creating temperature data every minute. You will see how using the row key and sequence can be a powerful data modeling tool.

Single device per row – Time Series Pattern 1

The simplest model for storing time series data is creating a wide row of data for each source. In this first example, we will use the weather station ID as the row key. The timestamp of the reading will be the column name and the temperature the column value (figure 1). Since each column is dynamic, our row will grow as needed to accommodate the data. We will also get the built-in sorting of Cassandra to keep everything in order.

 

Time Series Pattern 1

Figure 1

CREATE TABLE temperature (
   weatherstation_id text,
   event_time timestamp,
   temperature text,
   PRIMARY KEY (weatherstation_id,event_time)
);

Now we can insert a few data points for our weather station.

INSERT INTO temperature(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:01:00′,’72F’);

 

INSERT INTO temperature(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:02:00′,’73F’);

 

INSERT INTO temperature(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:03:00′,’73F’);

 

INSERT INTO temperature(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:04:00′,’74F’);

 

A simple query looking for all data on a single weather station.

 

SELECT event_time,temperature

FROM temperature

WHERE weatherstation_id=’1234ABCD’;

 

A range query looking for data between two dates. This is also known as a slice since it will read a sequence of data from disk.

 

SELECT temperature

FROM temperature

WHERE weatherstation_id=’1234ABCD’

AND event_time > ’2013-04-03 07:01:00′

AND event_time < ’2013-04-03 07:04:00′;

 

Partitioning to limit row size – Time Series Pattern 2

 

In some cases, the amount of data gathered for a single device isn’t practical to fit onto a single row. Cassandra can store up to 2 billion columns per row, but if were storing data every millisecond you wouldn’t even get a month’s worth of data. The solution is to use a pattern called row partitioning by adding data to the row key to limit the amount of columns you get per device. Using data already available in the event, we can use the date portion of the timestamp and add that to the weather station id. This will give us a row per day, per weather station, and an easy way to find the data. (figure 2)

Time Series Pattern 2

 Figure 2

CREATE TABLE temperature_by_day (

   weatherstation_id text,

   date text,

   event_time timestamp,

   temperature text,

   PRIMARY KEY ((weatherstation_id,date),event_time)

);

Note the (weatherstation_id,date) portion. When we do that in the PRMARY KEY definition, the key will be compounded with the two elements. Now when we insert data, the key will group all weather data for a single day on a single row.

 

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03′,’2013-04-03 07:01:00′,’72F’);

 

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03′,’2013-04-03 07:02:00′,’73F’);

 

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-04′,’2013-04-04 07:01:00′,’73F’);

 

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-04′,’2013-04-04 07:02:00′,’74F’);

 

 

 

To get all the weather data for a single day, we can query using both elements of the key.

 

SELECT *

FROM temperature_by_day

WHERE weatherstation_id=’1234ABCD’

AND date=’2013-04-03′;

 

 

Reverse order timeseries with expiring columns – Time Series Pattern 3

 

Another common pattern with time series data is rolling storage. Imagine we are using this data for a dashboard application and we only want to show the last 10 temperature readings. Older data is no longer useful, so can be purged eventually. With many other databases, you would have to setup a background job to clean out older data. With Cassandra, we can take advantage of a feature called expiring columns to have our data quietly disappear after a set amount of seconds. (figure 3)

Time Series Pattern 3

Figure 3

CREATE TABLE latest_temperatures (

   weatherstation_id text,

   event_time timestamp,

   temperature text,

   PRIMARY KEY (weatherstation_id,event_time),

) WITH CLUSTERING ORDER BY (event_time DESC);

 

 

Now when we insert data. Note the TTL of 20 which means the data will expire in 20 seconds.

 

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:03:00′,’72F’) USING TTL 20;

 

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:02:00′,’73F’) USING TTL 20;

 

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:01:00′,’73F’) USING TTL 20;

 

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature)

VALUES (’1234ABCD’,’2013-04-03 07:04:00′,’74F’) USING TTL 20;

 

As soon as you insert the data, start selecting all rows over and over. Eventually you will see all the data disappear. This is an example of the TTL period expiring. Imagine what kind of interesting things you could do with your application data model using these.

 

Conclusion

 

Time series is one of the most compelling data models for Cassandra. It’s a natural fit for the big table data model and scales well under a variety of variations. Many production use cases are similar to the examples above. For those users the problem of storing data at machine generating speeds and still keeping it organized in a useful manner is no longer a challenge. Hopefully this getting started guide will get your creative juices flowing.

Posted in Planet Cassandra | Tags: cassandra, data modeling, time series |

Open letter to Oracle DBAs

29th May, 2013 · Patrick McFadin

This is for my brothers and sisters out there managing databases. If you are a developer or manager or anything else I’m going to have to ask you to step out while we talk. Go ahead, we’ll wait…

Ok. Are they gone? Good. Now we can get to the meat of the matter. Thing is, we in the Cassandra community need your help. For years Cassandra has been managed by developers or sysadmins and they have done a great job so far. This community is maturing fast and well, we need some database pros to help take it to the next level.

I was an Oracle DBA for years. I remember when Oracle 8i came out and man what a game changer. What little Informix and Sybase DBs we had left were quickly replaced. We had applications being built for the web and needed a DB to keep up. Along the way we were developing some tighter controls on how dev teams accessed the servers. Java was quickly becoming the app programming language of choice in our shop and developers were living large with JDBC (ok, not all roses but they had direct access. Epic for them) At first we just gave people SYS access if they had a vague reason for that much access. Let’s just say that backfired. “You changed what setting? Are you out of your mind?!!” Raise your hand if you ever said that! Or hearing that sheepish “Uh no, we didn’t have a backup” Ah yes. Good times.

We put all that to an end pretty quick. The secret to us getting a good nights sleep was in helping our dev teams get what they needed in a controlled manner without complete anarchy. Having a dev environment gave them a playground but the closer you get to prod, the tighter things got. We also learned to ask more questions about the application and their data model. “No, I think doing a right outer join with a full table scan on a 1 million row table isn’t a good idea. Can I help you fix that?” Our job was to be THE database experts and protect our poor production servers from the tornado of crappy code sure to come. Later I transitioned to being a developer and when I did, I relied on those same pros to make sure my code was bullet proof. I didn’t have time to keep up with all the cool new features and methods but that was their job. I could count on their help when needed.

So here we are years later and it’s like what was old is new again. We’ve got a lot of choices on how to store data now. Given the pain I endured to get the hammer of Oracle to pound in every nail, I’m totally cool with that. I spent too much time trying to get things to scale or stay constantly available. Funky de-normalization schemes or worse, sharding things to the point my apps were almost a DB themselves. Not too mention every time I had to do any of that, I was contributing large sums of money to Larry Ellison’s private island or crazy plane collection or whatever. It was pricey to say the least. I tried a lot of new databases along the way and eventually found Cassandra solved most of my needs in the scaling and uptime department and as a friend told me “It just #$%! works.” Well put. Having come from the DBA world, I applied what came naturally to me. Put some process around deployment and operations, enjoy those extra hours of sleep. Take a good look around though, I think that was more the exception than the rule.

So here’s my urgent plea: We need your skills! Cassandra is not a threat to your job. It’s not a fancy new shiny thing that will disappear soon. (I had a CIO tell me the same thing about the web. Ahem) It solves a different set of problems that fit a lot of applications. Developers moved on it quick to solve application issues, but it’s time to add your skills to the party. You know that just tossing any old code at your DB is a bad plan, and that hold’s true with Cassandra. The deep knowledge of how the database works, stores data and stays online will set you apart. We need database professionals that can manage the complexity of daily operations.

Here’s how you can help. Learn as much as you can about Cassandra. Take it for a spin and see what it can do. You can download a package or tarball on the DataStax web site or you can use the DataStax AMI to get a cluster up and running in a few minutes. There is a lot of documentation, tutorials and presentations on Planet Cassandra and the DataStax web site. Oh and if you really want to get serious, the Cassandra summit is the place to be. There will be a lot of people just like you talking about how they did it. (Oh and if you send me an email and mention this rant, I’ll get you a super special price.) The goal should be understanding how it works and how you can use it effectively to solve your persistence problems. I’ll bet along the way you will find a few. What are you waiting for? If you need some help, don’t hesitate to ask. I’m here to help!

Posted in Planet Cassandra | Tags: cassandra, dataops, dba, operations, oracle, planet cassandra |

Follow me on Twitter

My Tweets
© My Website
  • About Me