Mukul Kumar's Blog

Stephen Lawson reports on Computerworld - Demand for storage is doubling every 18 to 24 months, and within five years, Roberson expects to see a "yottabyte year" when the industry as a whole ships 1 yottabyte (a billion gigabytes), or 1,000 zettabytes, of storage capacity.

HP’s new HP StorageWorks 9100 Extreme Data Storage System (ExDS9100) has a base configuration will consist of four blade servers and three storage blocks, with 246TB of storage. Customers will be able to add either type of capacity independently of the other. With two racks, a system can have as much as 820TB of storage capacity.

The ExDS9100 is scheduled to ship in the fourth quarter. HP predicted that it will cost less than $2 per gigabyte in a typical configuration.

Labels: disk, hp, storage

Some insightful articles and some of my own thoughts on the trends in data storage:

THE BACKGROUND:
Disk capacities are going up and costs are going down, however the effective transfer bandwidth (ETB) per byte of capacity has come down tremendously. Despite capacities and transfer rates increasing by factors or 10,000 and 100 respectively, typical drive ETB has actually decreased by a factor of 100. As Jim Gray said "Disks have become tapes." (Link to source).

Consider, for example, a 10 TB database. Ten years ago, this database would have occupied two thousand 5 GB drives - a common size at the time. With a 3 MB/second transfer rate, the aggregate bandwidth of these 2,000 drives would have been 6 GB/second, enabling the entire database to be scanned in about 30 minutes. Today, only about 20 higher-capacity drives would be needed to hold this same database. Those 20 drives would have an aggregate bandwidth of 1.2 GB/second, increasing the time required to scan the entire database to 150 minutes - an increase of two hours.

DISKS ARE BECOMING A SEQUENTIAL ACCESS DEVICE RATHER THAN A RANDOM ACCESS DEVICE
Jim Gray points out - We have to convert from random disk access to sequential access patterns. Disks will give you 200 accesses per second, so if you read a few kilobytes in each access, you're in the megabyte-per-second realm, and it will take a year to read a 20-terabyte disk. If you go to sequential access of larger chunks of the disk, you will get 500 times more bandwidth—you can read or write the disk in a day. So programmers have to start thinking of the disk as a sequential device rather than a random access device.

Tom White later says that - "MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk (seeking is the process of moving the disk's head to a particular place on the disk to read or write data). Read more here.

My take is that SSDs are going to take a while to become an economically viable alternative to disks. Flash disks cost approximately $10/GB, and the OEM costs of good flash drives cost about $60/GB or more (source here). Compare this with the cost of disk, which is about $0.20/GB. So, we are looking at about 300x price difference here. So, I think, it's going to take while before SSDs become reality in storing terabytes of data. Until that time, we will have to use 50-70% empty disks to enhance striping-performance. So, if we were to use 50% empty disks, the cost of disks doubles for storing the same amount of data.

Labels: bandwidth, disk, hadoop, mapreduce, performance, pubmatic, RAID, scalability, solid state disk, SSD, storage

Mukul Kumar's Blog

Thursday, May 08, 2008

Storage shipments - 1 Billion Gigabytes a year!

Thursday, March 20, 2008

Disk storage - where are we headed?

Twitter Updates

Previous Posts

Archives

What am I reading