Friday, December 14, 2012

Comparison of Popular NoSql databases (MongoDb,CouchDb,Hbase,Neo4j,Cassandra)

There are many SQL databases so far.But i personally feel the 15 years history of SQL coming to an end as everyone is moving to an era of BigData. As experts say SQL databases are not a best fit for Big Data No Sql databases came into picture as a best fit for this which provides more flexibility in storing data.
I just want to compare few popular NoSql databases that are available at this point of time.Few well known NoSql databases are
NoSql databases differ each other more than the way Sql databases differ from each other.I think its one's responsibility to choose the appropriate NoSql database for their application based on their use case.Lets do a quick comparison of these databases.


  • Written in  :  c++
  • Main point : Retains some friendly  properties of SQL (Query, Index)
  • Licence : AGPL(Drivers : Apache)
  • Protocol : BSON (Binary JSON)
  • Replication : Master/Slave Replication  and automatic failover via Replica Sets
  • Sharding : Built-in
  • Queries are javascript expressions.
  • Runs arbitary javascript function server side.
  • Better Update-in-place than CouchDb.
  • Uses memory mapped files for data storage.
  • Performance over features.
  • Journaling (with --journal ) option turned on starting th mongod server.
  • Has Geospatial Indexing.
  • On 32-bit systems limited to 2.5GB.
  • Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.
  • For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.


  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Custom, binary (Thrift)
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys
  • BigTable-like features: columns, column families
  • Has secondary indices
  • Writes are much faster than reads (!)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")
  • For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.   


  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Cascading, hive, and pig source and sink modules
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL
  • A cluster consists of several different types of nodes
  • Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.
  • For example: Analysing log data.


  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via _changes (!)
  • Attachment handling
  • thus, CouchApps (standalone js apps)
  • jQuery library included
  • Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.
  • For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed
  • Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.
  • For example: Social relations, public transport links, road maps, network topologies

Reffered Sources : kkovacs , wikipedia