Sep 13, 2008

How video codecs work

Video codecs also have improved dramatically. The challenge of encoding video, however, is orders of magnitude more difficult than encoding audio. We found that a minute of CD-quality audio is about 10 MB before it is encoded. That's nothing compared to video. If the video is being digitized in the RGB color space, each pixel uses 24 bits (8 for red, 8 for green, and 8 for blue). So that means a frame of video uses:

720 lines * 486 pixels * 24 bits/pixel = 8,398,080 bits =
1,049,760 bytes
= 1MB per frame of video


To get the file size for a 20-minute podcast, we remember that there are 30 frames per second, so:

1MB * 30 frames * 60 seconds * 20 minutes = 36000MB = 35.15 GB

Yes, you read that right. A 20-minute podcast can chew up an entire hard drive, or at least a good chunk of one. Of course, the preceding calculations assumed uncompressed RGB video, and most podcasts are done using a DV camera. Because DV video is compressed at a 5:1 ratio, you're only looking at around 7 GB for your 20-minute podcast. But imagine downloading a 7 GB file! That's not going to happen in a flash. It's going to take a good long time.

So the first thing we have to consider is reducing the resolution of the video so there are fewer pixels to encode in each frame. If you resize down to 320×240, you've reduced the file size by 75 percent. You also can cut the frame rate in half for further data reduction. But it turns out that this is still nowhere near the amount of reduction required to be able to deliver this video reliably and in an acceptable amount of time (and without breaking your bandwidth budget). To do this, video codecs rely on perceptual coding, using inter-frame and intra-frame encoding.

Intra-frame encoding is encoding each frame individually, just as you would when you shrink an image using a JPEG codec. Inter-frame encoding is a more sophisticated approach that looks at the difference between frames and encodes only what has changed from one frame to the next. This is illustrated in Figure 1.


Figure 1: Inter-frame compression encodes only the differences between frames.


To be able to encode the difference between frames, the codec starts off by encoding a full frame of video. This full frame is known as a key frame. After the key frame, a number of difference frames are encoded. Difference frames, unsurprisingly, encode only what has changed from the previous frame to the current frame. The codec encodes a number of difference frames either until a scene change or when the amount of change in the frame crosses a predetermined threshold. The sequence of key frames and difference frames is illustrated in Figure 2.


Figure 2: Inter-frame compression uses a sequence of key frames and difference frames.


The combination of reduced screen resolutions, frame rates, intra-frame compression, and interframe compression is sufficient to create satisfactory video experiences at amazingly low bit rates. Although no one would want to pay to watch it, you can create video files at bit rates as low as 32 Kbps. Of course, we recommend using ten times that much for your video podcast. At 300 Kbps and above, you can deliver an entirely satisfactory video experience. It won't be perfect, but it should be more than adequate.

Codec side effects
No codecs are perfect. Even when codecs claim to be transparent, an expert somewhere can tell the difference. At higher bit rates, the differences between the original and the encoded version are minimal. As the bit rate decreases, however, the differences become easy to spot. Perceptual codecs attempt to remove things that we won't notice, but unfortunately they're not always successful.

Because so much information must be removed from files, you get less of everything in the encoded version of your file. The frequency range is reduced, as well as the dynamic range. If you're encoding video, you have a smaller screen resolution and possibly a decreased frame rate. If that's not enough, you also see or hear artifacts in your podcast.

Artifacts are things that weren't in the original file. In encoded audio files, artifacts can be heard as low rumbling noises, pops, clicks, and what is known as "pre-echo," which gives speech content a lisping quality. For video files, you may notice blocking artifacts, where the video is broken up into blocks that move around the screen. You also may see smearing, where the video image looks muddy and lacks detail.

If your podcast has audible or visible artifacts, you should check your encoding settings. Audio podcasts in particular should not have artifacts; you should be more than capable of producing a high quality audio podcast. Video, however, is a different matter. If you're delivering a 320×240 video podcast encoded at 300 Kbps, chances are good that you'll encounter a few artifacts. They shouldn't interfere with the ability to enjoy your podcast. If they do, you'll need to revisit your equipment or your shooting and editing style, or simply encode your video podcast at a higher bit rate.

No comments: