I continue on my journey to compare one more NoSQL database with MySQL. This time I started exploring Cassandra. My previous post compared Mongo DB with MySQL.
In order to have the same type of comparison, I created a database in Cassandra. The following statement was used to create the column family:
CREATE TABLE SALES_TRANSACTIONS ( tx_id text primary key,
tx_date timestamp, prod_id text, qty int, store_id text);
This table had an index on tx_id.
I used the similar loop to populate 100,000 records into this Cassandra database. It took 214 seconds to do this insertion.
I wanted to verify if the records have been inserted properly. So through a CQL shell, I ran the following query:
Select count(*) from sales_transactions;
I got the result as “10000” with the message “Default LIMIT of 10,000 was used. Specify your own LIMIT clause to get more results”
I thought I will verify with the tx_id and also with a different limit.
See the screen shot of this testing (click on it to see an enlarged version).
The CQL shell went into a very bad state. As you can see, the select query that worked before started to fail afterwards. On the server console, I could see a stacktrace of something having gone wrong.
This was very disappointing. With this, I stopped exploring Cassandra further.
Limitations of Cassandra
Even though the CQL syntax is very similar to SQL, the where clause is limited only to the indexed columns. With SQL databses as well as MongoDB, you can query on any field in your table/document. If it is not indexed, you may not get the performance, but it works.
Both SQL databases and MongoDB provides aggregation capabilities (group-by along with an aggregation function). Cassandra does not provide any aggregation capabilities.
Cassandra does not provide the Java driver in the form of a jar. They ship the source code which has many other dependencies. If you use Maven,they will be pulled in your project. Other option is to look at the Maven config files and get all the right versions of the dependencies before you can connect to Cassandra.
The CQL shell going into a corrupted state, the server showing stack traces is not a good sign. You have to think twice (or more) before you would consider this for a development project, putting it in production is far out.
I was using Cassandra version 2.09.4 for this evaluation.
Overall my impression of Cassandra has not been very great. The feel of the server interface, the hoops to get the Java driver going, the limitations on the query interface, the lack of aggregation capabilities, the quality and quantity of documentation don’t give a warm exciting feeling to get started with this NoSql database. I think there is quite a bit of work cut out for Cassandra.