Flash Memory

I have been finding and reading some great references on flash memory lately and thought would collate up some of the better ones here (and leave some takeaways as well). For starters, ACM queue magazine had a great issue entitled Enterprise Flash Storage last year. Jim Gray’s and Goetz Graefe’s article are good reads. The best read though is the link through to the paper by Birell et al. advocating a better architecture for flash controllers. At a product level, Tom’s hardware’s review of the latest Samsung SSD and the earlier review of the Intel X25-M by AnandTech are good reads as well.

I often find myself asking the question on what would be the most obvious/big things happening now – if we were looking back five years forward from now. After reading the review on the Intel X25 – there’s no doubt that the emergence of flash technology would be one of those big things.

As a computing professional trained for years to think about hard drives – i found the unique architectural constraints of the Flash chip architecture (as presented in the Birell paper for example) refreshing and thought provoking. For starters – while the naiive assumption of most people is that Flash gives very high random read and write performance – it turns out that from a write perspective they are really like disks – only worse. Not only is one much better off writing sequentially for performance reasons, writing randomly also causes reduced life (because blocks containing randomly over-written pages will need to be erased at some point – and flash chips only support limited number of erasures). The other interesting aspect that Gray’s paper reports is that sequential read bandwidth does depend on contiguity – with maximum bandwidth being obtained at read sizes of around 128KB. The new generation of flash drives (including the X25) also seem to be close in implementation to the Birell paper – implemented more like log structured file systems rather than traditional block devices.

All of which implies that these drives solve some of the old problems (random read performance) but create new ones instead. The problems are entirely predictable and well exemplified by this long term test of the X25. Log Structured file systems cause internal fragmentation – small random overwrites causing a single file’s blocks to be spread randomly – causing terrible sequential read performance (and as Gray’s paper shows – one needs contiguity for sequential read performance even for flash drives). The other obvious aspect is that the efficacy of the lazy garbage cleaning approach depends a lot on free space. The more the free space, the more the overwrites that can be combined into a single erasure and the less the number of extra writes per writes (so called Write Amplification Factor). Conversely, handing over an entire flash disk to an OLTP database seems like a recipe for trouble – write amplification will increase greatly over time (if things work at all). It also seems that there are ATA/SCSI commands (UNMAP) in the works so that applications can inform the disks about free space – however this seems like another can of worms. How does a user level application like mysql/innodb invoke this command? (and how can it do so without a corresponding file system api in case it is using a regular file?)

All of which make me believe that at some point the most prominent database engines are going to sit up and write their own space management over flash drives. For example – instead of a global log structured allocation policy – a database maintaining tables clustered by primary key (for example) is much better served by allocation policies where a range of primary keys are kept close together (this would have much better characteristics when a table is being scanned (either in full or in part).

All in all – fun times for people in database/storage land!