Disk storage - where are we headed?
Disk capacities are going up and costs are going down, however the effective transfer bandwidth (ETB) per byte of capacity has come down tremendously. Despite capacities and transfer rates increasing by factors or 10,000 and 100 respectively, typical drive ETB has actually decreased by a factor of 100. As Jim Gray said "Disks have become tapes." (Link to source).
Consider, for example, a 10 TB database. Ten years ago, this database would have occupied two thousand 5 GB drives - a common size at the time. With a 3 MB/second transfer rate, the aggregate bandwidth of these 2,000 drives would have been 6 GB/second, enabling the entire database to be scanned in about 30 minutes. Today, only about 20 higher-capacity drives would be needed to hold this same database. Those 20 drives would have an aggregate bandwidth of 1.2 GB/second, increasing the time required to scan the entire database to 150 minutes - an increase of two hours.
DISKS ARE BECOMING A SEQUENTIAL ACCESS DEVICE RATHER THAN A RANDOM ACCESS DEVICE
Jim Gray points out - We have to convert from random disk access to sequential access patterns. Disks will give you 200 accesses per second, so if you read a few kilobytes in each access, you're in the megabyte-per-second realm, and it will take a year to read a 20-terabyte disk. If you go to sequential access of larger chunks of the disk, you will get 500 times more bandwidth—you can read or write the disk in a day. So programmers have to start thinking of the disk as a sequential device rather than a random access device.
Tom White later says that - "MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk (seeking is the process of moving the disk's head to a particular place on the disk to read or write data). Read more here.
My take is that SSDs are going to take a while to become an economically viable alternative to disks. Flash disks cost approximately $10/GB, and the OEM costs of good flash drives cost about $60/GB or more (source here). Compare this with the cost of disk, which is about $0.20/GB. So, we are looking at about 300x price difference here. So, I think, it's going to take while before SSDs become reality in storing terabytes of data. Until that time, we will have to use 50-70% empty disks to enhance striping-performance. So, if we were to use 50% empty disks, the cost of disks doubles for storing the same amount of data.