Showing posts with label how. Show all posts
Showing posts with label how. Show all posts

Sep 5, 2008

How perceptual codecs & audio codecs work

Perceptual codecs take advantage of how we actually perceive audio and video, and use this information to make intelligent decisions about what information can safely be discarded. Perceptual codecs are by definition lossy because of this. The original cannot be recreated from the encoded file. Instead, an approximation that attempts to retain as much fidelity as possible is constructed. The idea is that we won't notice what has been discarded.

Our ears are extremely sensitive. We can hear from 20Hz to 20,000Hz and sounds over a wide dynamic range, from a whisper to a scream. We can pick out a conversation at the next table in a crowded restaurant if the topic happens to catch our ear. We can do this because our brains filter out the information that is not of interest and focus on the rest. Our brains effectively prioritize incoming sound information.

For example, even a quiet classroom has plenty of sounds, such as the hum of air conditioning, people shuffling papers, and the teacher lecturing at the front. If someone sneezes in the room, for that split second, everyone notices the sneeze and nothing else. The sneeze is the loudest thing in the room and takes precedence over everything else.

Similarly, our eyes can take in a wide range of visual information, the entire color spectrum from red all the way through purple, and from very dim environments to very bright environments. Our field of vision is approximately 180 degrees from left to right. What we actually pay attention to, though, is much more focused. In general, we pay more attention to things that are brightly colored and things that are moving.

Perceptual codecs use this information to make better decisions about what information in audio and video files can be discarded or encoded with less detail. Perceptual codecs prioritize the loudest frequencies in an audio file, knowing that's what our ears pay most attention to. When encoding video, perceptual codecs prioritize bright colors and any motion in the frame.

At higher bit rates, perceptual codecs are extremely effective. A 128 kbps MP3 file is considered to be the same apparent quality as a CD and is only one-tenth the size of the original, which is pretty incredible if you think about it. Some of the savings is encoding efficiency, but the majority of it is perceptual encoding. As the bit rate is lowered and the codec is forced to discard more and more of the original information, the fidelity is reduced and the effects of perceptual encoding are more audible. Still, you should always balance the required fidelity of your podcast with the realities of bandwidth and throughput.

How audio codecs work
Audio codec technology has made spectacular advances in the last few years. It's now possible for FM quality to be encoded in as little as 32 kbps (in mono, that is). Modern codecs such as Windows Media, Real, and QuickTime AAC can achieve CD quality in approximately 64 Kbps. How do they do it?

The idea is to capture as much of the frequency and dynamic range as possible and to capture the entire stereo image. However, given the target bit rate, the codec usually determines what a reasonable frequency range is. Files that are encoded in mono are always slightly higher fidelity, because the encoder worries about only one channel, not two.

Another economy can be made if the codec knows that it will be encoding speech. Speech tends to stay in a very limited frequency and dynamic range. If someone is talking, it's unlikely that her voice will suddenly drop down an octave, or that she'll start screaming for no reason. Knowing this, a codec can take shortcuts when encoding the frequency and dynamics information.

Caution Don't try to encode music using a speech codec. The shortcuts a speech codec uses are totally unsuitable for music, because music uses a very wide frequency range and is generally very dynamic. If you encode using a speech codec, it sounds awful. So don't do it.


After the frequency range has been determined, the codec must somehow squeeze as much information as possible into the encoded file and decide what can be discarded. Perceptual audio codecs use the concept of masking to help make that decision. If one frequency is very loud, it masks other frequencies, so the codec can safely discard them because we wouldn't perceive them.

This is why all background noise must be minimized in your original recordings and your programming must be nice and loud. This ensures that the codec doesn't discard any of the programming information.

Apr 25, 2008

How compression works

How compression works
Most compressors offer the same basic controls, which allow you to set the following:

Threshold: Where the compression effect kicks in

Ratio: The amount of compression applied above the threshold

Attack and Release times: The length of time after the threshold is crossed that the effect is applied and removed

Figure 1 illustrates what different compression curves look like. Looking at the curve, you'll see that signal levels below the threshold are unaffected, and signal levels above the threshold are attenuated. The higher the compression ratio is, the more attenuation. When the compression ratio is high, it is known as limiting, because it more or less prevents the audio from exceeding the threshold.


Figure 1: Compression curves with different compression ratios


Setting a threshold
To illustrate how different threshold settings affect the output, let's assume that we're working with the audio file illustrated in Figure 2. We can see that this file has peaks as high as -2dB, but the bulk of the file is below the -10dB mark. If we want to compress this file lightly, we should set a threshold in the -6dB to -10dB range. Figure 3 illustrates compression applied to this file using two different thresholds, -6 and -20dB.


Figure 2: The audio file from Figure 7.5 after compression using a threshold of a) -6dB and b) -20dB



Figure 3: The audio files from Figure 2 after applying compensating gain


What is immediately apparent is that the file in Figure 2b has been compressed far more heavily than Figure 2a. We need to apply some gain to restore these files to their former levels. Figure 2b has far more headroom, so we can apply much more gain. After applying gain, we end up with the files illustrated in Figure 3.

These files are both much louder than the original, but if you look closely at the figure on the right, the entire file is loud. It doesn't have any dynamics left, because the dynamic range has been compressed. To be honest, this file might be a little too compressed. Files that have been over-compressed are fatiguing to listen to, because EVERY SINGLE SYLLABLE IS LOUD. Think of drive-time radio programs; they're highly compressed, because the DJs are going absolutely nuts in the studio. The idea is to compete with all the noise of traffic and to keep you awake on your drive to and from work. But this is not the type of programming you really want to listen to all day long (see the "Compression: How Much Is Enough?" sidebar).

If your original audio file is well recorded, you should have peaks in the -3dB to -6dB range. Choosing a threshold in the -6dB to -10dB range is a safe starting point. This way, you're only compressing the loudest sections of your file, leaving most of your file untouched. If you find yourself dropping your threshold much below that, you may consider revisiting your signal chain to figure out why your recording is so quiet in the first place.

Setting a ratio
The ratio setting determines how much compression is applied over the threshold. For example, a 2:1 compression ratio means that for every 2dB by which the incoming signal exceeds the threshold, only 1dB of gain is applied. Ratios up to around 4:1 are mild and can be used safely, provided you set a sensible threshold. Ratios in the 4:1 to 10:1 range are fairly heavy and should be used with caution. Any ratio over 10:1 falls into a special category known as limiting.

Limiting can be useful as a preventative measure, but it isn't appropriate as your main form of compression. For most applications, start off with a ratio of 4:1 and experiment with using slightly more or less until you achieve the effect you're after. In particular, voices are particularly compression tolerant, so if you don't have any music in your podcast, you may be able to use more compression (see the "Compression: Voice versus Music" sidebar).

Setting attack and release times

The attack and release times control how quickly the compression effect is applied to signals that exceed the threshold you set, and how quickly the signal level is returned to the original input. For most content, you want a quick attack time, so signals that exceed the threshold are immediately attenuated. For the release time, you want something a bit longer, so the sound doesn't abruptly get returned to the original level.

This is fairly self-evident from the attack and release controls. The scale on the attack control knob generally is in milliseconds, and the scale on the release knob is in seconds. Start with a quick attack, say 10-20 milliseconds, and a gradual release around 500 milliseconds. These settings should work for most podcasting content, but don't be afraid to play around with your compressor to see how these settings affect the compression.

Apr 14, 2008

EQ (Equalization)

EQ (Equalization)
Equalization, or EQ as it is commonly known, is adjusting the tonal quality of audio by turning up or down certain frequencies. Audio engineers use the terms boost for turning up and cut for turning down. Many of you are probably familiar with EQ via the bass and treble controls on your home or car stereos. In fact, you may have already fiddled with these knobs to adjust the sound; congratulations, you're an audio engineer. Using your ears as a guide, you adjusted the EQ until it sounded right to you. That's exactly what EQing is.

EQ is used for a number of reasons. Sometimes you may need to enhance the tone of your audio by boosting frequencies that make your audio sound more pleasant. You may need to boost these frequencies to make up for a deficiency in your mic or because your voice sounded different one day due to a cold or a late night. On the other hand, you may want to cut frequencies that aren't helping the sound of your audio. This can also be due to a deficiency in your signal chain or to get rid of something unpleasant, like an excessively nasal sound or too much bottom end.

How to use EQ
At the end of the day, it's all about making your audio sound better. You want your podcasts to sound bright and full, not dull and thin. You may notice that the terms used to describe the effects of EQ are very subjective. Audio engineers regularly use terms like "presence," "sparkle," "warmth," and "air." Believe it or not, these terms aren't that subjective; they're actually ways of referring to certain parts of the frequency spectrum — where exactly in the spectrum is subject to debate, but in the next section we provide you with a table of frequency ranges, along with common terms used to describe the frequencies in each range. The first thing to do, however, is to listen to your audio critically and ask yourself a few questions. The following questions start at the bottom of the frequency range, and work up from there, but you can think about them in any order you like:

- Is it "warm" enough? Is there enough low frequency information? Be careful here, because even good studio monitors have a hard time reproducing the lowest audible frequencies.

- What about the midrange? Are the voices clear and understandable? Or is the sound too harsh?

- How about the high frequencies? Does the audio sound dull? Or do you have the opposite problem — too much high frequency information?

If you're unsure about the answer to any of these questions, do what other audio engineers do — use something else as a reference. For example, you could listen to a radio program that is similar to yours, and do what we call an A/B comparison. Listen to the radio program for ten seconds, listening in particular to the low frequencies. Then flip back your podcast, and compare. Does yours have less? More? Flip back and listen to the middle frequencies, and compare. Finally, listen to the high frequencies. This should give you an indication of whether your podcast is up to scratch, and if it isn't, what you may need to add (or cut) to make your podcast sound better. The trick is to find the right frequencies.