Compression and Layering in Hadoop

One of the relatively late lessons I have received in operating a Hadoop cluster has been the (almost overwhelming) importance of compression in storage, computation and network transmission. One of the architectural questions is whether compression belongs to the file-system (and similarly the networking sub-system) or whether it is something […]

GoodBye Rajeev

It’s incredible – but Rajeev Motwani is no more. It’s hard to imagine that i won’t run into him anymore walking University Avenue or that he’s no longer an email away to discuss (and tear down) the latest startup idea at some cafe nearby. My heart goes out to his […]

Update on Hive+Hadoop+S3+EC2

A formal recipe on running SQL queries using EC2 against S3 files is now posted at: But not before hitting a few more bugs ( HADOOP-5861 ). Running a TPCH query using Hive was a pretty high point. (I did have to omit the order by clauses though :-() […]

Curt Monash reports on Hadoop/Hive @ Facebook

Curt Monash posted a blog post on our (myself and Ashish Thusoo’s) conversation with him regarding Hadoop and Hive and their deployment and usage at Facebook.  It is heartening to see the mainstream database and analytics community starting to cover Hadoop and Hive.  Even though these projects are rapidly becoming […]