Tuesday, February 21, 2012

GraphDatabase - The future for Facebook Recommendations

On what Basis are you getting Recommendations from Facebook??How your data is stored Internally in Social Network Sites ??

Have you ever thought how your information is stored by facebook in database?? Do you think its SQL that facebook is using for storing your data ?? If you think so ,then you are wrong.Its NoSQL GraphDatabase called 'Cassandra' what facebook uses to store your data.I know after reading this you will get lot of questions in your mind. 'What is Graph database??  How it looks like?? How it can be useful for Facebook Recommendations?? Where else it can be used??'.Let me explain each one in detail.

What is Graph database??

I think Wikipedia gives the best answer for this question.So i think i can just add a link to wikipedia for the introduction of graphDatabase. Here you go..!!

How it looks like??

I thing you got a basic idea about graph database after seeing Wikipedia page.Here i am showing sample example of a small Social Network of friends who KNOWS each other.

You can Imagine the entire Facebook database as a infinite Graph where the users keep on increasing day by day.Some thing like this
Where each node represents each Facebook user or  page and each edge between two users represents a FRIEND and LIKE relationship.

How it can be useful for Facebook Recommendations??

Consider the sample example of a small graph which has 3 users A,B,C.

  1. A - friend of B 
  2. B - friend of A,C 
  3. C - friend of B
 Now if u notice Facebook recommends 'C to A' and 'A to C ' to make friendship each other as 'user B' is the common friend between them.And this is as simple as to find the common node between two edges in GraphDatabase.

If you use SQL u need to join all 3 records together based on 'friends' field and need to find out the transitive relationship between A,B,C which is time taking.

The above example is a very basic one.More recommendations can be found out using mutual LIKES between two users,games,pages,etc...what not..!! These things can be easily implemented using GraphDatabase and it is very efficient than SQL.

Where else it can be used??

I feel GraphDatabases are very efficient to use for social networking,spatial search,recommendation engines(Ex: Amazon,Facebook),etc ....

Why Nooooo SQL .........???

 Relational databases have been around for many decades and are the database technology of choice for most traditional data-intensive storage and retrieval applications. Retrievals are usually accomplished using SQL, a declarative query language. Relational database systems are generally efficient unless the data contains many relationships requiring joins of large tables. Recently there has been much interest in data stores that do not use SQL exclusively, the so called NoSQL movement. Examples are Google’s BigTable and Facebook’s Cassandra. Lets have a look at NoSQL vs MySQL (common relational database system).

When to go for  NOSQL ??

In recent years, software developers have been investigating storage alternatives to relational databases. NoSQL is a blanket term for some of those new systems. Cassandra,BigTable, CouchDB, Project Voldemort, and Dynamo are all NoSQL projects, as they are all high-volume data stores that actively reject the relational and object-relational models.

Atomicity, consistency, isolation, and durability (ACID) are a set of governing principles of the relational model. Together, they guarantee database reliability. NoSQL rejects ACID.

The term “NoSQL,” as a term for modern web data stores,first began to gain popularity in early 2009. It is a topic that has gained recognition from the IT community but has yet to garner large-scale academic study. Still, the NoSQL movement has its own discussion groups, blogs, and conferences.

As the typical database administrator attempts to question whether to move from the relational model to a NoSQL model, the NoSQL community presents him or her with potential flags that the data might be more suitable for a NoSQL system.
  1. Having tables with lots of columns, each of which is only used by a few rows.
  2. Having attribute tables.
  3. Having lots of many-to-many relationships.
  4. Having tree-like characteristics.
  5. Requiring frequent schema changes.