Joydeep's Corner

Log Replay in MySQL and NetApp Filers

Posted on September 3, 2009February 18, 2021 by jss

Mark Callaghan started a discussion on mysql replication lag today on the MySql@Facebook page. This happens to be one of my favorite topics – thanks to some related work i did at Netapp. I was happy to know that there’s already a bunch of stuff happening in this area for […]

Continue

Flash Memory

Posted on July 6, 2009February 18, 2021 by jss

I have been finding and reading some great references on flash memory lately and thought would collate up some of the better ones here (and leave some takeaways as well). For starters, ACM queue magazine had a great issue entitled Enterprise Flash Storage last year. Jim Gray’s and Goetz Graefe’s […]

Continue

Compression and Layering in Hadoop

Posted on June 27, 2009May 9, 2012 by jss

One of the relatively late lessons I have received in operating a Hadoop cluster has been the (almost overwhelming) importance of compression in storage, computation and network transmission. One of the architectural questions is whether compression belongs to the file-system (and similarly the networking sub-system) or whether it is something […]

Continue

GoodBye Rajeev

Posted on June 6, 2009 by jss

It’s incredible – but Rajeev Motwani is no more. It’s hard to imagine that i won’t run into him anymore walking University Avenue or that he’s no longer an email away to discuss (and tear down) the latest startup idea at some cafe nearby. My heart goes out to his […]

Continue

Update on Hive+Hadoop+S3+EC2

Posted on May 20, 2009May 9, 2012 by jss

A formal recipe on running SQL queries using EC2 against S3 files is now posted at: http://wiki.apache.org/hadoop/Hive/HiveAws/HivingS3nRemotely But not before hitting a few more bugs ( HADOOP-5861 ). Running a TPCH query using Hive was a pretty high point. (I did have to omit the order by clauses though :-() […]

Continue

Hive + Hadoop + S3 + EC2 = It works!

Posted on May 14, 2009May 14, 2009 by jss

I have been enjoying my vacation time in India for the last few weeks and one of the fun projects i had taken up was getting a good story around running Hive on Amazon Infrastructure (AWS) . The use case i had in mind was something like this: A user […]

Continue

Curt Monash reports on Hadoop/Hive @ Facebook

Posted on May 12, 2009May 9, 2012 by jss

Curt Monash posted a blog post on our (myself and Ashish Thusoo’s) conversation with him regarding Hadoop and Hive and their deployment and usage at Facebook. It is heartening to see the mainstream database and analytics community starting to cover Hadoop and Hive. Even though these projects are rapidly becoming […]

Continue