Apr 27, 2009

Using a Content Distribution Network

Content distribution networks (CDNs) are designed to deliver large volumes of traffic quickly and efficiently. Whereas a Web host may have hundreds of servers in a single location, a CDN has multiple data centers, and generally the data is replicated across each data center. This is done both for data integrity, so that if there's a power failure somewhere your files are still available, and also for speed. Most CDNs also rely on caching to make downloads happen faster.

Caching is a technique where the most popular files are stored at multiple locations so that when they are requested, the request doesn't have to go all the way back to the origin server. For example, let's say http://CNN.com has an extremely popular story on its home page. The origin servers are most likely in Atlanta, where CNN is based. The first time someone in Los Angeles requests the CNN home page, a copy, along with all the images, is sent from Atlanta to Los Angeles. Then a copy is stored in a cache somewhere on the west coast, so that the next time someone requests that page, it can be served directly from the local cache and not re-requested from the origin server in Atlanta.

CDNs offer premium delivery services, so you won't find pricing like you can with the Web hosting services. CDNs also like to deal in very large numbers, so if you're not expecting to spend hundreds of dollars a month, you shouldn't waste your time calling CDNs. However, when your podcast is at a stage where you have a large audience that demands quality service, a CDN is your best bet.

The CDN marketplace

The CDN market is often divided into the "tier 1" providers, who have the largest and fastest networks, and the "tier 2" providers, who may have slightly more aggressive pricing but may not offer the same service. CDNs are often graded in terms of their availability, which some brag about in terms of "five nines." This means that their network is available 99.999 percent of the time. Another metric used to grade CDNs is the response time, which is the average amount of time it takes for a CDN to respond to a request. A number of services grade CDNs from time to time on their performance. The CDN market has seen lots of consolidation in the last few years, and prices have dropped considerably. The performance of the tier 2 CDNs has come so close the tier 1 providers that the tier 1 providers have had to drop their prices to remain competitive. Some even question whether there is enough distinction between providers to classify them into separate tiers anymore. Be that as it may, these are some of the better-known CDNs:

CDN pricing

CDNs use two basic models for pricing. Traditionally, they have billed using what is known as the 95th percentile model; recently many are moving to a per megabyte/gigabyte model. One of the frustrating things about CDNs is that it is almost impossible to translate between these two pricing models, making it difficult for CDN customers to figure out their cost of delivery.

Using the 95th percentile model, your price is quoted as a dollar amount per megabit of concurrent throughput. For example, a CDN may offer you a price of $50 per megabit. The tricky part is how your concurrent traffic figure is arrived at over the course of a month. The CDN measures exactly how much throughput you are using many times over the course of each day. At the end of the month, the CDN tabulates all the measurements, discards the top 5 percent of the measurements, and uses the 95th percentile measurement as your billable amount.


Note

Billing at the 95th percentile is measured in terms of megabits, not Megabytes. Be careful when you're doing your calculations!

Here's another way of looking at it. There are 720 hours in a 30-day month, so 5 percent of a month translates to 36 hours. So when you're billed at the 95th percentile, the busiest 36 hours of the month are discarded, and you're billed for the throughput your site used during the 37th busiest hour. Still confused? Don't worry, you're not alone. This model is frustrating to customers, because it is hard to understand and hard to budget for. In some ways, it's good because you don't pay for momentary spikes in your traffic. On the other hand, it's very hard to calculate your cost of delivery on a per-file basis because that varies depending on your traffic patterns.

For example, let's say you've got 1,000 subscribers, and you put up a new podcast each day. Over the course of that day, all 1,000 of them check the RSS feed and download the podcast. Assuming the same 5-minute, 5-MB file we talked about earlier in the chapter, and assuming folks are on an average broadband connection of about 300 Kbps, it's going to take the average listener about 2 minutes to download your file:

5 MB * 1024 MB/KB * 8 bits/byte = 40,960 Kbits
40,960 Kbits / 300 Kbps = roughly 136 seconds (fudging the
difference between K and k)

Given an ideal distribution, over the course of a day, just over 700 people could download your file, one at a time, and you'd never have more than one person at a time downloading. But given that most of your listeners probably will be from a limited number of time zones, and most folks will check their favorite feeds either at lunch time or early in the evening, you'll probably get the bulk of your downloads during three or four hours a day. That means you'll have around 5 to 10 people downloading at any given time. Let's say 10 for a liberal estimate. This means your concurrent throughput during these hours will be:

10 * 300 Kbps = 3 Mbps

This should end up being your 95th percentile number, because you're hitting this peak for 3 to 4 hours each day. If you're paying $50 per megabit, you can then do some math to figure out what your cost per delivery is per file:

3 Mbps * $50 = $150
1,000 subscribers * 20 podcasts/month = 20,000 podcasts
$150 / 20,000 podcasts = under a penny a podcast

So it's not impossible to get your cost of delivery, but it involves some calculation. And the calculations are highly dependent on your traffic patterns. Your bill depends on when people download the file, not how many. So it's very hard to figure out what your incremental cost per subscriber is, because it depends on when they download the file!

For this reason, some CDNs are now offering pricing on a per gigabyte basis. Using this model, customers are billed for the total amount of throughput they use. It doesn't matter when the throughput is used. This model is much simpler for customers to understand, because the math is straightforward. Using the previous example, let's calculate what the cost would be, assuming a cost of $1 a gigabyte:

20,000 podcasts * 5 MB/podcast = 100,000 MB = 100 GB
100 GB * $1 = $100
$100 / 20,000 podcasts = half a penny a podcast

Using this model, it appears that it's cheaper, and we know it's a hard cost that we can use in our calculations. Each podcast costs a half a penny to deliver. Contrast this with the 95th percentile where we know the cost is under a penny, but that could change depending on traffic patterns. However, don't let these numbers fool you. It's not quite this simple.

Let's say your podcast audience doubles in size. Using the cost-per-gig model, we know our costs would double. However, with the 95th percentile model, they may not. An audience that's twice as large quite probably would come from a much more dispersed geographical area and may distribute the load over more hours in the day. It's quite possible that you could double your audience size and not pay any more at all! That's the tricky part. If you make efficient use of your bandwidth, the 95th percentile model can be substantially cheaper.

Some companies put caps on bandwidth usage, so for example they're never using more than one megabit per second. This slows down file delivery for people if they're all trying to download at the same time, but keeps costs low. In fact, this is a tool that some hosting companies use to make sure they keep their costs low. (How else do you think they can offer so much bandwidth for free?) You may be able to do the same if you work with your CDN. This compromises performance for your listeners, but it can be a good cost-savings mechanism.

No comments: