Nov 17, 2008

Basic Encoding Techniques

Whether you encode your podcast by exporting directly from your editing platform or by using a stand-alone encoder, you can specify a number of parameters. You may have only a few choices if you're using encoding presets, or you may have the opportunity to specify exactly how you want your podcast encoded.

In the early days of low bit-rate encoding, back when people were connected to the Internet via slow modems, encoding technology was limited and required lots of tweaking to extract the best quality. Now, ten years later, codec technology and Internet connection speeds have improved so much that encoding high-quality podcasts should be within everyone's reach.

This is particularly true of audio podcasts. Modern codecs such as RealAudio and Windows Media Audio are capable of attaining FM-mono quality at a mere 32 kbps. The MP3 codec lags behind in quality, but because you can safely encode your podcast at 128 kbps, you should not have any quality issues.

Video is a little trickier. Assuming the majority of your audience is on a broadband connection, your video quality is limited by available bandwidth. Although you can't expect DVD quality at these bit rates, there's no reason why you can't create a perfectly acceptable video experience. This chapter helps you choose settings that should do the job. Let's start off with the easy stuff — audio encoding.

Audio Encoding


Audio encoding is easy, for a number of reasons. Raw audio files are large, but nowhere near as huge as video files. Therefore, the amount of compression that is needed to reduce them to a size that is suitable for Internet distribution is not excessive. Audio codec technology has progressed to a point where low bit rate encoding produces very good results. Podcasting reaps the benefits of ten years of cutthroat competition between RealNetworks and Microsoft, and the progress made by the MPEG organization with AAC encoding.

Because modern codecs sound so good, you really don't need to do much tweaking when you're encoding audio. You really have to decide only three things: whether to encode in stereo or mono, whether to use a speech or a music codec, and what bit rate to use.

Mono versus stereo
The first thing to decide is whether to encode your podcast in stereo or mono. If your program is predominantly interviews or spoken word, encode in mono. Mono encodings are always higher fidelity at a given bit rate, because only a single channel is encoded instead of two. If you're encoding in mono, you can use a lower bit rate and get the same quality or you can get better quality than a stereo encoding at the same bit rate.

If your content is predominantly music, you should encode in stereo, although it isn't strictly necessary. Even though music is recorded in stereo, most of the content is right in the center of the mix. The lead vocal, the snare drum, the bass drum, all will be right in the center of the speakers. And watch where you place your speakers. If you aren't sitting directly between the speakers, you aren't experiencing the full stereo effect anyway. However, one good reason to target stereo if you're playing music is that half your audience may be listening on headphones, which exaggerates the stereo effect.

Speech versus music
The next thing to decide is whether to use a speech codec or a music codec. If you're encoding an MP3 file, you don't have a choice. MP3 is a music codec. The good news is that MP3 is perfectly suitable as a speech codec as well, provided the bit rate is high enough.

Speech codecs can take special shortcuts during the encoding process due to the nature of speech content. With speech, the dynamic range tends to be very limited, as is the frequency range. After you start talking, the chances are good that you'll continue to speak at roughly the same volume and in the same register. Knowing this, a speech codec can make intelligent decisions about how to encode the audio.

Music content, on the other hand, has a wide dynamic and frequency range. There are bass drums and bass guitars, as well as crashing cymbals and violins. The shortcuts that a speech codec takes are completely unsuitable for encoding music content.

So the choice is fairly obvious: If you're encoding content that is speech only, you can encode at very low bit rates and still achieve high quality using a speech codec. However, for most applications, a music codec is perfectly appropriate.

Bit rates, sample rates, and quality equivalents
The most important decision to make about your audio podcast encoding is what bit rate to use. The bit rate determines the eventual file size of your podcast, which in turn determines how long it takes to download. The bit rate also determines the fidelity of your podcast. The higher the bit rate, the higher fidelity your podcast is.

The listed audio bit rates range from 20 kbps to 256 kbps. If you're producing audio-only podcasts, you should target somewhere between 64 kbps and 128 kbps. If you're encoding predominantly speech, you can safely stay at the low end of that; if you're encoding music, you may want to stick to the higher end of the spectrum.

Note At the end of the day, you know best how you want the podcast to sound. Try encoding at a couple of different bit rates, and see which one sounds best to you.

The other thing you may be able to set is the sampling rate. The sampling rate determines how much high-frequency information is encoded. For example, CD-quality audio uses a sample rate of 44.1 KHz, to capture the full 20–20,000 Hz frequency range. The sampling rate has to be at least double the highest frequency you're trying to capture. Depending on what bit rate you're targeting, you may be offered a few different sampling rates.

The interesting thing about sampling rates is that a higher sampling rate isn't necessarily better. The sampling rate determines how often the incoming audio signal is sampled, so it determines how much audio the encoder has to try to encode. If you set a higher sampling rate, you're telling the encoder to try to encode more high-frequency information, but the encoder may have to sacrifice the overall quality of the encoding. Essentially, the sampling rate determines the trade-off between the frequency range and the fidelity of the encoding. At a given bit rate, an encoder can offer higher fidelity with a reduced frequency range or reduced fidelity with a higher frequency range.

We suggest that you choose a lower sampling rate, thereby allowing the encoder to create a higher fidelity version of your podcast. There is very little information above 16 KHz in most audio programming, and most people don't have speakers that reproduce it faithfully anyway. Therefore, choosing a 32 KHz or 22 KHz sampling rate should provide more than enough high-frequency information.

No comments: