Aug 28, 2008

Codecs Overview : How codecs work

We know that encoded files are much smaller than raw media files; the question is how do encoders achieve this file size reduction, and why does the quality suffer?

At the heart of all encoding software lies the codec. Codec is a contraction of coder-decoder (or compressor-decompressor), and is the software algorithm that determines how to shrink a file to a usable size. You're probably already familiar with a number of codecs, though you may not be aware of it. For example, most digital cameras take pictures that are compressed with the JPEG codec. If you've ever used a photo-editing program to reduce the size or quality of your photos before you put them online, you've been adjusting the parameters of the JPEG codec. StuffIt and WinZip use codecs to compress files before they're sent across the Internet or put on installation CDs.

There's a key difference, however, between the JPEG codec used to compress photos and the codecs used to compress documents. Codecs used to compress documents must be lossless. If someone sends you a spreadsheet that has been compressed, when it de-compresses the data must be exactly the same as it was before the compression. Codecs such as JPEG, however, are known as lossy codecs, because some of the original information is lost during the compression. The original cannot be recreated from the compressed version of the file. Lossy codecs operate under the assumption that the quality lost either is not noticed by the end user or is an acceptable compromise required for the situation.

Web sites are a perfect example. Having lots of imagery on a Web site is great, but if the images were all 5 MB originals, each page would take forever to load. Because browsing the Internet should be a rapid, seamless experience, and because we sit so close to our monitors, the amount of detail required in a Web site image is much less than what is required for a printed page, so the image can be compressed heavily using the JPEG codec, and our experience isn't overly compromised.

The same holds true for podcasts. While it might be nice to have 256 kbps CD-quality podcasts, the reality is 128 kbps offers more than enough quality, and in fact 64 kbps might be plenty, particularly if you're not using the MP3 codec. As you reduce the bit rate of your podcast, the quality is also reduced, because the codec must delete lots more information.

Codecs try to maintain as much fidelity as possible during the encoding process, but at low bit rates something has to give. There simply isn't enough data to reproduce the original high fidelity. Given the complexity of the task, they actually do an amazing job. They're able to do as well as they do because they make use of perceptual models that help them determine what we perceive as opposed to what we hear. The difference is subtle, but key to modern codec efficiency. Before we talk about perceptual encoding techniques, let's talk a bit about basic codec technologies.

How codecs work
Codecs reduce file sizes by taking advantage of the repeated information in digital files. Lots of information is repeated. For example, a video that has been letterboxed (black stripes on the top and bottom) has lots and lots of black pixels. This results in lots and lots of zeros, all in a row. Instead of storing thousands of zeros, you could store "1000 × 0," which is only six characters. That's a significant savings. Also, you can reconstruct an exact copy of the original based on the information that you have stored.

Another way of encoding is to substitute for commonly occurring combinations of characters. For example, you could make this book smaller by replacing every instance of the word "podcasting" with "p." This wouldn't save that much space, though, and that's the problem with lossless encoding. You can achieve some file size reduction, but typically not enough for our needs. For this, you need perceptual encoding.

Aug 22, 2008

Encoding : Throughput & Quality equivalents

As mentioned in the previous section, throughput is the measure of the amount of bandwidth you use over time. You'll encounter throughput when you use a service to distribute your podcast, because most offer a certain amount of throughput for free and bill you for any used in excess of that. Obviously, you want to keep your monthly bill as low as possible, so you want to try to limit the amount of throughput you use.

When you encode your podcast, you want to balance the desire to provide the highest quality possible with the reality of your throughput bill at the end of the month. Many podcast distribution services offer generous amounts of free throughput each month, so this may not be an issue when you first start out. If your podcast becomes wildly popular, though, you may be faced with a need to cut your operating costs (until that first sponsor or advertiser comes around, of course). If so, you may want to consider reducing the bit rate of your podcast, which reduces the quality of your podcast, but that may not be noticeable to your audience. Remember, most people listen to podcasts while sitting in front of their computers, and multimedia speakers aren't renowned for their quality. What you want to deliver is a podcast quality that is equivalent to other broadcast media, which in the case of AM and FM radio isn't that high to begin with.

Quality equivalents
The concept of broadcast quality to mean really, really good. However, anyone who has listened to AM radio knows that it doesn't sound anywhere near as good as FM, and for that matter FM radio doesn't sound as good as CDs. Yet they're both broadcast standards, and we still listen to radio, even AM. Different types of programming do not need as much fidelity as others.

The idea, then, is to figure out how much fidelity your programming requires and produce content to that standard. When recording the content, you should always record at a very high standard, because that gives you the most flexibility later on. But when it comes time to encode your content for Internet distribution, you may want to sacrifice a bit of quality for the cost savings it provides.

Table 1 lists some common bit rates offered by encoding software and brief descriptions of what quality you can expect using different encoding technologies.

Note In Table 1, you should notice that MP3 audio quality is always slightly worse than Windows Media, Real, and QuickTime AAC, particularly at low bit rates. This is because the MP3 codec is older and wasn't really designed for low bit rate encoding. At higher bit rates (128 Kbps and above), the quality differential is less apparent.

Aug 12, 2008

Why Encoding Is Necessary - Bandwidth

You've spent countless hours and quite possibly a sizeable sum of money to produce a broadcast-quality podcast. Now you're being asked to take the polished result and convert it to a different format, which may compromise the quality of the original. Why?

The simple answer is because the raw audio and video files are too large to deliver practically via the Internet. There's no technical reason you can't deliver the original files — but it would take an incredibly long time for the files to download, and your monthly delivery bill would be sky high. To better understand the practical limitations involved, you must understand the concepts of bandwidth and throughput.

Bandwidth, in the networking sense of the word, is a measurement of the amount of data that is being transmitted at any given point. Throughput is the aggregate amount of bandwidth that has been used over a given time period. Think about water coming out of a faucet: The water can come out slowly or quickly depending on how much you open the tap. A gallon jug fills slowly or quickly depending on how fast the water is coming out. The "bandwidth" of the faucet is the speed of the water coming out; the "throughput" is the total amount of water that comes out.

In podcasting, we come across bandwidth and throughput in a number of different areas. First, each of your potential audience members is connected to the Internet in some way, and that connection has an advertised bandwidth. If they're on DSL or cable modem, they may have a download bandwidth somewhere between 256 kilobits per second (kbps) to several megabits per second (mbps). Similarly, when you upload your podcast to a server or distribution service, you're using bandwidth, but you're uploading, not downloading. The upload or upstream speed of DSL and cable modems is usually far less than the download speed. Regardless of which direction the data is traveling, the bandwidth available determines the speed at which the transfer takes place.

Let's say you've recorded a 20-minute audio podcast. If you've recorded at CD quality, you recorded in stereo, sampling at 44.1kHz, using 16 bits per sample. We can determine how large this file is using some simple math:

44,100 samples/sec * 16 bits/sample * 2 channels = 1,411,200
1,411,200 bits/sec / 8 bits/byte = 176,400 bytes/second
176,400 bytes/second / 1024 = 172.3 kilobytes per second (KBps)
172.3 KB/sec * 60 secs/min * 20 min = 206,718.75 KB
206,718.75 / 1024 = Approximately 202 megabytes (MB)

So the raw file is over 200 megabytes. (In fact, you can do this math much more quickly: One minute of stereo CD audio is approximately 10 MB, so 20 * 10 = ∼200 MB.) Let's assume one of your audience members is on a fairly standard DSL line, with a download speed of approximately 500 Kbps. You can calculate the download time with a bit of math. All you have to do is convert the file size from megabytes into kilobits, and then divide by the download speed:

200MB * 8 bits/byte = 1600 megabits
1,600 megabits * 1024 = 1,638,400 kilobits
1,638,400 kilobits / 500 kbps = 3,266 seconds
3,266 seconds / 60 seconds/minute = 54.6 minutes

So your podcast would take just under an hour to download. If the person is downloading in the background, this might not be too much of a problem, but chances are he's checking e-mail, surfing the Web, and doing other things on his computer that might further constrict the available bandwidth, which in turn makes the download take even longer. Additionally, he may not be getting the full bandwidth that he's paying for (see the "Why Does My Broadband Connection Seem Slow?" sidebar). Overall, this is not an optimal experience.

What we want to do is deliver a high-quality podcast that doesn't take hours to download. Encoding software enables us to do precisely this. For example, if we encode the file using an MP3 codec, we can achieve CD quality using only 128 kbps. In this case, our file would be:

128 kbits/sec * 60 seconds/minute * 20 minutes = 153,600 kilobits
153,600 kbits / 8 bits/byte = 19,200 kbytes
19,200 kbytes / 1024 = 18.75 MB

Our file size is less than ten percent of what it was before, and the download time is therefore reduced to about five minutes, which is much more like it. And because each of your listeners is downloading a smaller file, you use much less throughput.

Aug 5, 2008

Advanced Video Production Techniques - Adding Titles

Most professional video programming has some sort of opening sequence that usually includes lots of candid footage mixed with shots of the star(s) and some sort of graphic rendition of the title of the program. You should take the same approach. If your show has a name, let folks know about it! If they download it to their iPod and forget about it until it magically appears on their screen one day when they're browsing through their clips library, you want them to know the name of the program and who you are. So you'll probably want to use titles.

However, the problem is that what looks good (and is legible) on a television screen in general ends up way too small to be read on a small 320×240 screen. Titles at the bottom of the screen (called lower thirds) can be very hard to read if they're not done with large enough fonts. PowerPoint slides are particularly tough, because most people try to pack far too much information into a single slide, which makes it difficult for people to absorb, and the small fonts become very hard to read when reduced. To top it all off, video codecs have a tough time with text, because they don't treat it as being distinct from the video. So when your podcast is encoded, you're going to lose even more quality, as depicted in Figure 10.9.

Figure 1: PowerPoint slides are a good example of why text is tough: (a) Scaled to 320×240 and (b) after encoding at 300 Kbps.

The PowerPoint slide in Figure 1 isn't too bad to start off with; it has only five main points on the slide. By the time the slide is reduced to 320×240, the sub-points are too hard to read, and after the encoding process, even the main points are starting to look a little ragged.

If you're going to use text in your podcast, think big. Try not to have more than three or four points per slide if you're using PowerPoint, and if you're adding titles to your show and/or your guests, make sure to use a font large enough so that it is legible after the encoding process.