Wednesday, November 19, 2008

The Real Cost of Amazon CloudFront

Amazon introduced web service for content delivery, called Amazon CloudFront yesterday. CloudFront is thought to be a bring a pricing war between the current CDN providers.

If you do a little bit calculations for the real cost of the CDN, it turns out that it is much higher than the advertised pricing, for smaller files.

Following is the effective cost per GB for USA locations for CloudBurst. For example - if your files are 5KB in size, you will actually pay $0.3797 per GB not $0.17 . If your file size is 10MB, then you will pay the advertised price of $0.17 per GB. So, essentially if you are distributing images or movies, CloudFront will be cost effective, however if you are distributing JavaScripts of small size, you may be paying a lot more.

File Size (KB) Effective Cost per GB
5KB $0.3797 (123% more)
10KB $0.2749
20KB $0.2224
50KB $0.1910
100KB $0.1805
500KB $0.1721
1MB $0.1710
5MB $0.1702
10MB $0.1701
100MB $0.1700
1GB $0.1700

Here is how I calculated this - for US locations, data transfer rate (for first 10 TB / month) is $0.17 and request rate is $0.01 per 10,000 requests.
Effective cost per GB = $0.17+(1024*1024/file_size * 0.01/10000);

Labels: , , , ,

Saturday, June 28, 2008

Designing Caches for Highly Scalable Web 2.0 Applications

Unix/Linux file systems have been designed in a way that reads are heavily cached and sometimes pre-fetched. There are various techniques, algorithms and methods for read caching, and each file system has its somewhat unique method and therefore performance. Most file systems would use page-cache for caching read-I/O and buffer cache for caching the metadata.

There has been immense amount of research in this area – of how to improve the read performance using caching (see here and here).

Enter highly scalable Web 2.0 era, enter Facebook - if you look at the Facebook IO-profile in my previous post – 92% of the read (for photos) are served by the CDN. What that means is reads will only happen once, after that the file will be cached in the CDN and the read will never go to the backend storage (NetApp filer in this case). So all the file system caching is probably going waste, since we are never going to read from the file-system-cache ever. Facebook photos are cached in CDN for 4.24 years (their http cache-control max-age is 133,721,540), which means the CDN will not go back to the origin server for that period.

This raises interesting questions – do file systems really need to do any caching, what is the read-write ratio for such an application, how can this file system be better tuned for such an application?
Can file system cache be better used for pre-fetching the entire metadata in the cache, so that Facebook NetApp filer has to do fewer than 3 reads for reading a photo?

Thoughts?

Labels: , , , ,

Thursday, June 12, 2008

Web Service for Content Distribution Network

Having used CDN for a year now, I can say that the complexities of CDN deployment, origin server configuration, Apache configuration, pricing models, file placement, cache flush makes it pretty non-trivial deployment.

Just wondering why not have a "web service model for CDN".

Google recently introduced Google AJAX Libraries API.Google would place some of the popular AJAX libraries on their CDN servers thereby allowing caching, gzip etc. - thereby making the web pages load faster. Great idea! That could make some web pages really fast, quick to load.

Let's make CDN simple. Let's make hosting a file on CDN a 2 step process. See the following mockup:


Step 1: Upload file
Step 2:
Content type Bandwidth Requests per second


The entire process of - origin server configuration, Apache configuration, pricing, file placement, cache flush etc. can be made a 2 step process. A wrapper can be written on top of this entire process.

The pricing model can easily cover the cost of the server and the cost of developing such a web service. This could be a service on top of some of the popular CDN providers such as Panther Express, Level3, Limelight Networks, Akamai, etc.

I am surprised Amazon doesn't provide this service in conjunction with EC2 and S3.

Thoughts?

Update: Just read an interesting article - "10 Easy Steps to use Google App Engine as your own CDN". I am going to try it out, and see what the latencies look like, from different cities in the world.

Labels: , ,