A formal recipe on running SQL queries using EC2 against S3 files is now posted at: http://wiki.apache.org/hadoop/Hive/HiveAws/HivingS3nRemotely
But not before hitting a few more bugs ( HADOOP-5861 ). Running a TPCH query using Hive was a pretty high point. (I did have to omit the order by clauses though :-()
I am amazed at how far Hive has come (and yet how glaring some of the missing features are). I am also impressed by the promise of the cloud (this being my first project using S3/EC2) and at how different the experience was as compared to programming/administering a large in-house cluster. Amazon’s infrastructure seems to scream for developers to jump in and add value. Hopefully i will get a chance to explore some ideas on this in future posts.
Which version of Hive are you using. Order by is already in, so you should be able to use it at least in trunk.