Category Archives: Hadoop


By looking around for a suitable system to do realtime analytics for the project group I have come to Storm.

Storm is a system to distribute realtime analytics to a cluster of server. For this the storm system uses Zookeeper to coordinate master and worker nodes and use a message queues to build create datastream to connect the workernodes. Such a connection of several workernode is called a topology. A topology could by deployed local on one server or could be submitted to the cluster.

I have read some stuff about the system and it would be easy to run this on our cluster and connect this to e.g. hbase. All we need ist to set up the masterprocess and the worker nodes, zookeeper ist already running for hbase.

Here some further information:

A presentation of storm:

The stormwiki:


Leave a comment

Posted by on 26.03.2012 in Hadoop, Zookeeper


InfoQ: HBase @ Facebook



InfoQ: HBase @ Facebook.


Kannan Muthukkaruppan overviews HBase, explaining what Facebook Messages is and why they chose HBase to implement it, their contribution to HBase, and what they plan to use it for in the future. 

Leave a comment

Posted by on 07.11.2011 in Hadoop


Tags: , , ,

Hadoop and Hive Development at Facebook

Leave a comment

Posted by on 07.11.2011 in Hadoop


Tags: , , , , ,

Hadoop and Pig at Twitter

Related to the previous post on how Hadoop is used at Facebook, there is an interesting slide deck on how Twitter is using Hadoop and Pig. Take a look.

The video of the original talk is available at Yahoo!. Very impressive to see the much less code you have to write in Pig when compared to the native Java MapReduce jobs (around 20:00).

Leave a comment

Posted by on 31.10.2011 in Hadoop, Pig, Zookeeper


Tags: , , , , , , , , , ,

How Facebook is using Hadoop…


Posted by on 31.10.2011 in Hadoop, Technologies


Tags: , , ,