Dec 21, 2008

Video Encoding | Learn Podcasting

Video encoding is more involved than audio encoding. Raw video files are much larger than raw audio, so a much greater degree of compression is required. Modern video codecs also have benefited from fierce competition between the leading streaming media platforms, as well as the MPEG organization. Most encoding software packages include a number of presets that produce acceptable video quality. If you want to try to improve your quality by tweaking the encoding parameters, this section explains the basic options available to you and how they affect video quality.

Screen resolution
The most important decision you're going to make about your video podcast is what resolution (or screen size) you're targeting. This is largely determined by the bit rates you're targeting, which in turn are determined by your audience. The higher the bit rate, the larger your resolution can be.

Most encoding software programs let you specify any screen resolution you want. You can specify that you want the full 640×480 frame encoded at 100 kbps, and the encoder does the best job it can. What you end up with is a large screen full of blurry blocks moving around, because 100 kbps simply isn't enough to encode a resolution that large.

Table 1 lists some common video bit rates and suggested screen resolutions. Note that the resolution is largely dependent on the content of your video. If you have lots of motion in your podcast, you have to use either a higher bit rate or a smaller resolution to achieve acceptable video quality. If your podcast is relatively static and you filmed using a tripod, you may be able to try slightly larger screen sizes.

Frame rate
Another parameter you can adjust is the frame rate. NTSC video is shot at 30 frames (actually 60 fields) per second. However, for low action content, you may be able to get away with a lower frame rate. For example, interview footage often looks just fine at 15 frames per second. Higher action content requires the full frame rate for smooth motion.

Adjusting the frame rate affects the overall clarity of the video, because no matter how many frames per second you're encoding, you always have a fixed bit rate at which to encode. If you're encoding at 300 kbps, you can spread those 300 kilobits over 30 frames or 15 frames. Obviously, if you're only encoding 15 frames instead of 30, you can dedicate more bits per frame, and the result is a higher-quality frame. However, this may be a false economy for high action content.

Remember how video encoding is done. First, a key frame is encoded, which is followed by a number of difference frames. High action content has lots of motion and, therefore, lots of difference from frame to frame. If you drop the frame rate to try to economize, the encoder drops frames and doesn't encode them. There are certainly fewer frames to encode, but the differences between them are greater! This is illustrated in Figure 1.

Figure 1: When encoding at a reduced frame rate, the increased differences between frames may negate the gains of encoding fewer frames.

If your programming has very little motion in it, such as talking head content, you may see an improvement in quality by dropping the frame rate. However, if you have lots of action in your podcast, leave the frame rate as is. To get higher video quality, you'll have to either encode at a higher bit rate or reduce your screen resolution.

Note The frame rate of NTSC video is actually 29.97 frames per second, although 30fps is often used as shorthand.

Bit rate
Along with the screen resolution, the other important choice you have to make is the bit rate of your podcast. The bit rate determines the quality and the file size, and it's the gating factor for the resolution. The bit rate you choose is determined to some extent by your audience, and the length of your podcast. The idea is that you don't want your audience to have to wait forever to watch your podcast. If the podcast is being downloaded in the background by an aggregator, then this isn't an issue. But many video podcasts are watched on Web pages. The user clicks a link and expects to see something in a reasonable amount of time.

Because most podcasters host their podcasts on a Web server, most video podcasts are progressively downloaded. Progressively downloaded videos have to preload a bit before they start playing back. The amount of preload is determined by the embedded player. The player knows how big the video file is and calculates how long the video will take to download. The player also knows how long the video is and tries to figure out how soon it can start playing the video so that by the time the playback reaches the end, all the video will have downloaded.

Let's say you've encoded your video at 300 kbps, which is a pretty standard bit rate. Most broadband connections can sustain this bit rate consistently, so as the video starts downloading, the player realizes in a few seconds that the data is being downloaded fast enough to begin playback. The total wait time for your audience is minimal.

Now let's take the same podcast and encode it at 500 kbps. Let's say it's a 1-minute podcast. The total file size is going to be approximately 30,000 kilobits (we'll stay in the world of bits because the math is easier). If the user's connection can sustain 300 kbps, it takes 100 seconds to download the entire file. If the file is 60 seconds long, that means the user has to wait 40 seconds before playback can begin. This is probably a little excessive, unless your audience is very dedicated. And this is if your podcast is only a minute long. For each additional minute, the audience is forced to wait an additional 40 seconds. This can get out of hand quickly.

If you're doing longer form content, longer than 5 minutes, then you should choose a bit rate that your users can receive in more or less real time. We can take a hint from streaming media sites here, who commonly target 300 to 450 kbps. If you're worried about bandwidth costs, you can choose a lower bit rate. Your video quality might suffer a bit, but if you can't afford your bandwidth bill, you won't be able to continue podcasting!

Audio bit rate

Of course, the bit rate we have been talking about up to this point has been the total bit rate of your podcast. After you choose your target bit rate, you have to decide how much of that bit rate you're going to allot to the audio stream. There are audio codecs at bit rates from 5 kbps up to hundreds of kilobits per second. How much should you give your audio stream?

One thing to remember is that audio tells the story. Audio is what draws the audience in and keeps them attentive. Think about it: When you're watching television at night with friends and a commercial comes on, what happens? If you're like most people these days, someone punches the mute button on the remote, the room immediately comes to life with conversation, and people take the opportunity to grab something from the refrigerator. When the commercial break is over, the audio is un-muted, and everyone pays attention to the program again.

The same holds true for your podcast. It's worth making sure that your audio quality is good, because people will watch low quality video if the audio sounds good. They won't watch good quality video if the audio sounds bad.

A fairly safe rule is to use about 10 to 20 percent of your total bit rate for audio. At higher bit rates, you can stay toward the bottom of that scale; at lower bit rates, stay toward the top. Another suggestion is to avoid the lowest audio bit rate settings. The difference between a 5 kbps audio feed and an 8 kbps audio feed is huge; those extra 3 kbps won't add much to your video quality.

Dec 11, 2008

Multi-format encoding

In the beginning, podcasts were audio only and always encoded using the MP3 codec. But as people have started to realize the potential for video podcasting and portable media player displays have improved, the possibilities for podcasting have multiplied. The problem is that most of these enhanced opportunities come at a price, and that price is compatibility. Enhanced podcasts designed for the iPod do not play on other portable media players. Podcasts encoded using the Windows Media format for compatibility with the "Plays For Sure" portable players do not play on the iPod. And if you want to offer a video image larger than 320×240, it may or may not play back on portable media players.

So what can a podcaster who wants to push the envelope do? The best approach is to offer a number of formats and let your audience choose which version they'd like to subscribe to. Of course, if you're offering multiple formats, you're no longer encoding a single version of your podcast; you may be encoding three or four. For example, if you really want to offer every possible choice, you might offer the following:

  • An MP3 version for older media players

  • An enhanced iPod version with images

  • A 320×240 video version encoded in Windows Media

  • A 320×240 video version encoded in QuickTime H.264

  • A 640×480 video version encoded in Flash format for Web viewing

    Granted, this example may seem excessive, and the chances that someone would encode into so many different formats are pretty slim. However, it's not out of the realms of possibility. Rocketboom, one of the most popular video podcasts, is encoded into four different formats. If you want the largest possible audience and want to stay at the cutting edge of podcasting technology, you're going to have to encode multiple versions. This is where a multi-format encoder comes in handy.

    Tip If you're offering more than one format, offer separate RSS feeds for each so people can subscribe to their favorite format.

    Multi-format encoders enable you to choose a single source file and output to multiple formats. These encoders usually allow you to set up encoding presets, so you don't have to re-enter the encoding settings every time you encode. Many multi-format encoders also allow you to preprocess your original master, so if you want to do color correction or resizing, it can be done at the encoding stage.

    Some multi-format encoders offer automatic batch processing, where files placed into a specific directory are automatically processed and encoded. You can streamline your production chain if you're using a multi-format encoder with batch processing. This allows you to concentrate on your programming and let the batch processing take care of the rest.

    A number of multi-format encoding solutions are available, including these popular ones:

  • Sorenson Squeeze: The Sorenson Squeeze Compression Suite offers MP3, AAC, QuickTime, Windows Media, and Real formats (Mac users must have the Flip4Mac plugin installed to get Windows Media capabilities). You can add Flash encoding with an additional plug-in (see Figure 1), or by purchasing the Squeeze Power Pack.

    Figure 1: Sorenson Squeeze

  • Canopus Procoder: The Express version offers QuickTime, Windows Media, and Real support. The full 2.0 version also offers MP3 encoding, and Flash encoding if you have Flash MX installed.

  • Telestream FlipFactory: This offers MP3, QuickTime, Windows Media, Real, and Flash support. It also supports 3GPP (mobile phone format) with an additional plug-in.

  • Digital Rapids Stream: The basic version offers QuickTime, Windows Media, and Real support. The Pro version adds MP3 and Flash support. All Digital Rapids software requires Digital Rapids capture cards.
  • Nov 30, 2008

    Encoding Via Your Editing Platform

    If you've invested in a decent audio-editing or video-editing platform, chances are good that you'll use your editing platform to do your encoding. Most include a variety of export options. (You'll also want to export a broadcast-quality master for archival purposes, of course.) If your master includes lots of processing and complicated editing, you may want to render the broadcast-quality master and then encode using an encoding application or multi-format encoder, instead of doing all the processing twice. For most podcasts, exporting an encoded master directly from your timeline is probably easiest.

    Most audio-editing platforms offer MP3 encoding. Many also offer encoding in a number of other formats:

  • Audacity (Windows, Mac, Linux): Offers MP3 and Ogg Vorbis export

  • Peak (Mac): Offers MP3 and AAC export

  • Garage Band (Mac): Offers AAC export, which is fine for iPods, but does not support MP3 export

  • Sound Forge (Windows): Offers a number of export options, including MP3, Ogg Vorbis, Windows Media, and RealAudio (see Figure 1)

    Figure 1: Sound Forge offers a large number of export options.

  • Audition (Windows): Also offers a wide variety of support, including MP3, Windows Media, and RealAudio

    Video-editing platforms also offer fairly rich export options:

  • Final Cut Pro (Mac): Offers QuickTime H.264 support

  • iMovie (Mac): Offers QuickTime H.264 support, including a preset for iPods

  • Adobe Premiere (PC): Offers Flash, QuickTime, Windows Media, and RealVideo support

  • Sony Vegas (PC): Offers QuickTime, Windows Media, and RealVideo support

  • Ulead Video Studio (PC): Offers QuickTime, Windows Media, and RealVideo support, and includes output templates for iPods and SmartPhones
  • Nov 23, 2008

    Making Format Choices | Podcast

    Because this is a technical manual that includes business advice concerning podcasting, you might expect that we would tell you which format is best for your podcast. Unfortunately, that's not something we can do. Things were much simpler when a podcast meant an MP3 file that was automatically downloaded to a desktop and transferred to an iPod. Now that the term podcasting has expanded to include a variety of portable media players and video, the podcasting format wars have begun.

    The territory that is being fought over is very valuable. As podcasting continues to grow in popularity and people continue to time-shift their media consumption habits, the large media conglomerates are scrambling to catch up to the thousands of already-successful podcast brands that have been established. Similarly, the portable media player manufacturers are fighting tooth and nail for control of the player market. Control the player, and you control access to the millions of people who are discovering podcasting.

    To some extent, we can learn from streaming media. The industry that RealNetworks pioneered quickly became a three-horse race when QuickTime and Windows Media entered the field. Flash was a late entry to the field and is making a dent in everyone's market share numbers. Experts have talked about the imminent demise of MPEG4 or RealNetworks, but the reality is that there seems to be room for all the streaming formats, and none of them is going away anytime soon.

    The same probably holds true for podcasting. The iPod has a massive share of the portable media player market, but with Microsoft coming out with a portable media player as this book is being written, that is sure to change. As the term podcasting has broadened, so has the way people listen to and watch podcasts. Studies have shown that half of all podcasts are actually watched on a desktop or laptop computer, not a portable media player.

    Because the podcasting industry is still in its infancy, the situation is likely to continue to change. There is no easy answer to the format question, nor one likely in the short term. However, in the interest of helping you make a decision, we can point out a few things to help you cut through the media hype:

  • If you're producing an audio podcast, MP3 gets you the widest compatibility.

  • If you're producing a video podcast, QuickTime is a good choice because it's compatible with the iPod and anyone who has iTunes installed.

  • If you don't care about portable media players and are offering video playback via your site, Flash is a good option because it has good cross-platform support.

  • Windows Media has better video quality than QuickTime and Flash, and there are a heck of a lot of PCs out there.

  • RealNetworks are making huge inroads into the mobile market, particularly in Europe.

    The best way to figure out what format is best for your podcast is to start off simple, possibly offering only a single stream option. Monitor your e-mail and your blog comments. After you've developed a bit of an audience, ask them what they prefer. Podcasting is still a relatively intimate broadcast medium, and the way to make loyal audience members is to give them what they want.
  • Nov 17, 2008

    Basic Encoding Techniques

    Whether you encode your podcast by exporting directly from your editing platform or by using a stand-alone encoder, you can specify a number of parameters. You may have only a few choices if you're using encoding presets, or you may have the opportunity to specify exactly how you want your podcast encoded.

    In the early days of low bit-rate encoding, back when people were connected to the Internet via slow modems, encoding technology was limited and required lots of tweaking to extract the best quality. Now, ten years later, codec technology and Internet connection speeds have improved so much that encoding high-quality podcasts should be within everyone's reach.

    This is particularly true of audio podcasts. Modern codecs such as RealAudio and Windows Media Audio are capable of attaining FM-mono quality at a mere 32 kbps. The MP3 codec lags behind in quality, but because you can safely encode your podcast at 128 kbps, you should not have any quality issues.

    Video is a little trickier. Assuming the majority of your audience is on a broadband connection, your video quality is limited by available bandwidth. Although you can't expect DVD quality at these bit rates, there's no reason why you can't create a perfectly acceptable video experience. This chapter helps you choose settings that should do the job. Let's start off with the easy stuff — audio encoding.

    Audio Encoding

    Audio encoding is easy, for a number of reasons. Raw audio files are large, but nowhere near as huge as video files. Therefore, the amount of compression that is needed to reduce them to a size that is suitable for Internet distribution is not excessive. Audio codec technology has progressed to a point where low bit rate encoding produces very good results. Podcasting reaps the benefits of ten years of cutthroat competition between RealNetworks and Microsoft, and the progress made by the MPEG organization with AAC encoding.

    Because modern codecs sound so good, you really don't need to do much tweaking when you're encoding audio. You really have to decide only three things: whether to encode in stereo or mono, whether to use a speech or a music codec, and what bit rate to use.

    Mono versus stereo
    The first thing to decide is whether to encode your podcast in stereo or mono. If your program is predominantly interviews or spoken word, encode in mono. Mono encodings are always higher fidelity at a given bit rate, because only a single channel is encoded instead of two. If you're encoding in mono, you can use a lower bit rate and get the same quality or you can get better quality than a stereo encoding at the same bit rate.

    If your content is predominantly music, you should encode in stereo, although it isn't strictly necessary. Even though music is recorded in stereo, most of the content is right in the center of the mix. The lead vocal, the snare drum, the bass drum, all will be right in the center of the speakers. And watch where you place your speakers. If you aren't sitting directly between the speakers, you aren't experiencing the full stereo effect anyway. However, one good reason to target stereo if you're playing music is that half your audience may be listening on headphones, which exaggerates the stereo effect.

    Speech versus music
    The next thing to decide is whether to use a speech codec or a music codec. If you're encoding an MP3 file, you don't have a choice. MP3 is a music codec. The good news is that MP3 is perfectly suitable as a speech codec as well, provided the bit rate is high enough.

    Speech codecs can take special shortcuts during the encoding process due to the nature of speech content. With speech, the dynamic range tends to be very limited, as is the frequency range. After you start talking, the chances are good that you'll continue to speak at roughly the same volume and in the same register. Knowing this, a speech codec can make intelligent decisions about how to encode the audio.

    Music content, on the other hand, has a wide dynamic and frequency range. There are bass drums and bass guitars, as well as crashing cymbals and violins. The shortcuts that a speech codec takes are completely unsuitable for encoding music content.

    So the choice is fairly obvious: If you're encoding content that is speech only, you can encode at very low bit rates and still achieve high quality using a speech codec. However, for most applications, a music codec is perfectly appropriate.

    Bit rates, sample rates, and quality equivalents
    The most important decision to make about your audio podcast encoding is what bit rate to use. The bit rate determines the eventual file size of your podcast, which in turn determines how long it takes to download. The bit rate also determines the fidelity of your podcast. The higher the bit rate, the higher fidelity your podcast is.

    The listed audio bit rates range from 20 kbps to 256 kbps. If you're producing audio-only podcasts, you should target somewhere between 64 kbps and 128 kbps. If you're encoding predominantly speech, you can safely stay at the low end of that; if you're encoding music, you may want to stick to the higher end of the spectrum.

    Note At the end of the day, you know best how you want the podcast to sound. Try encoding at a couple of different bit rates, and see which one sounds best to you.

    The other thing you may be able to set is the sampling rate. The sampling rate determines how much high-frequency information is encoded. For example, CD-quality audio uses a sample rate of 44.1 KHz, to capture the full 20–20,000 Hz frequency range. The sampling rate has to be at least double the highest frequency you're trying to capture. Depending on what bit rate you're targeting, you may be offered a few different sampling rates.

    The interesting thing about sampling rates is that a higher sampling rate isn't necessarily better. The sampling rate determines how often the incoming audio signal is sampled, so it determines how much audio the encoder has to try to encode. If you set a higher sampling rate, you're telling the encoder to try to encode more high-frequency information, but the encoder may have to sacrifice the overall quality of the encoding. Essentially, the sampling rate determines the trade-off between the frequency range and the fidelity of the encoding. At a given bit rate, an encoder can offer higher fidelity with a reduced frequency range or reduced fidelity with a higher frequency range.

    We suggest that you choose a lower sampling rate, thereby allowing the encoder to create a higher fidelity version of your podcast. There is very little information above 16 KHz in most audio programming, and most people don't have speakers that reproduce it faithfully anyway. Therefore, choosing a 32 KHz or 22 KHz sampling rate should provide more than enough high-frequency information.

    Nov 11, 2008

    Other Encoding Formats | Podcast

    MP3 is perfect for audio podcasts, but you may want to work in other formats for a number of reasons. Many portable media players now include color displays. Enhanced podcasts are appearing to take advantage of these color displays that include graphics along with the audio. Enhanced podcasts can also include links for people who are watching the podcast on a browser. Enhanced podcasts also feature chapters, so people can quickly skip to the next or previous section of your podcast.

    You can create enhanced podcasts using the QuickTime and Windows Media formats. Of course, enhanced QuickTime podcasts play back only on iPods or in iTunes, and enhanced Windows Media podcasts play back only in Windows Media player and Windows Media compatible portable media players. Another enhanced podcast format is the Audible format, which was developed for audio books. The Audible format includes the chapters feature, as well as the ability to store a bookmark, so that if you stop listening in the middle of a podcast, the next time you listen the podcast starts where you left off. Because the Audible format has been around for so long, it is widely supported by almost every portable media player, as well as in iTunes, Windows Media Player, and RealPlayer.

    If you're creating a video podcast, a number of different formats are available, including QuickTime, Windows Media, Real, and Flash. Video podcasts have the same compatibility issues as enhanced podcasts, which means limited compatibility across portable players, and they require that the appropriate player software is installed on the audience's computer.

    Caution People are weird. Talk to one person and he'll tell you why he would never install media player A on his machine, while the next person swears by player A and is convinced media player B is the devil's spawn. To some extent, these people split across platform lines (Mac users swear by QuickTime, Windows users Windows Media, and Flash users hate everything else), but not always. Each media format has its strengths and weaknesses. If you're planning on a video podcast, you should support at least two formats. Regardless of which formats you choose, plan on getting disgruntled e-mails and blog comments from crazed audience members. You can't please everyone.

    Another reason to consider alternative formats is if you want to protect your podcast files using Digital Rights Management (DRM). DRM lets you place restrictions on your podcast, for example letting only paid subscribers listen to it. Not all formats support DRM. Because most podcasts are free and most podcasters want as many listeners as they can get, very few podcasts use DRM. This may change as people begin charging for their podcasts.

    If you're going to offer your podcast in an alternative format, you may need to download and install encoding software (see Figure 1). Many of these formats will be included in your audio or video editing platforms, but if not, the software is generally available for free from the manufacturers.

  • QuickTime: iTunes will encode in the AAC and MP3 formats and exports videos to an iPod compatible format, but if you want to tinker with the encoding settings, get QuickTime Pro. You can upgrade any copy of QuickTime to the Pro version for a mere $29.

  • Windows Media: The Windows Media Encoder is available as a free download from the Microsoft site.

    Note Microsoft has recently dropped support for Windows Media encoding on the Mac. However, Mac users can encode in Windows Media using products from Flip4Mac:

  • Helix RealProducer: If you're considering the Real format, you need the RealProducer.

    Unfortunately, if you're targeting mobile phones (where the Real format is strongest), you need Helix Producer Mobile, which is incredibly expensive. If it's any consolation, you can download a trial version that's good for 30 days.

  • Flash: To encode into Shockwave Flash (.swf) or streaming flash (.flv), purchase the Flash authoring tool. Several multi-format encoders also offer Flash support. flashpro/

  • Audible: Audible doesn't make their encoding software publicly available. Instead, you have to upload your original MP3 file to their Wordcast service, and they do the encoding for you.

    Figure 1: The Windows Media Encoder is available for free from Microsoft for the PC platform (Mac users must use Flip4mac).
  • Nov 3, 2008

    MP3 Encoding Tools

    If you're producing an audio podcast, you're probably best producing it in the MP3 format. Although it isn't the best audio codec available, it is by far the most compatible and plays on virtually any computer or portable media device. It may not have all the bells and whistles of other formats, but your audience is far less likely to have technical issues, which means you'll get fewer negative comments on your blog.

    Virtually any editing platform you're working on should have built-in MP3 encoding capabilities, but on the off chance that it doesn't, a number of standalone MP3 encoding applications can get the job done; one such application is shown in Figure 1.

  • iTunes: iTunes isn't really an encoding application, but it converts audio files to mp3 on import if you choose to do so in your preferences.

  • LAME-based encoders: Despite the ironic origin of the name (Lame Ain't an MP3 Encoder), LAME is an open source MP3 encoding library that is used in almost all free MP3 encoding applications. There are probably hundreds of these available; google "MP3 encoder" and see for yourself.

    Figure 1: WinLAME is one of many free MP3 encoders available.
  • Oct 27, 2008

    Encoding Tools

    You should understand clearly by now that your podcast must be encoded in a format that is suitable for Web distribution. To do this, either you must use a standalone encoding application or export directly from your audio-editing or video-editing software. If you want to be a real podcasting hot shot, you may want to encode an enhanced podcast or offer video podcasts for various portable media players. If so, you'll probably want to invest in a multi-format encoding solution. In addition to enabling you to encode a single file into a number of different formats, they let you tweak all the encoding settings so you can get the highest possible quality encoding.

    Your decision about which encoding software to use will be based on a number of factors:

  • Are you encoding audio or video podcasts (or both)?

  • If you're producing audio podcasts, do you want maximum compatibility across media players, or would you rather produce a cutting edge podcast with features that may not work on all players?

  • If you're producing video podcasts, how many formats do you want to produce?

    Podcasting purists consider podcasts to be MP3 files. But video podcasts are becoming increasingly popular, and there are serious cross-compatibility issues with video podcasts. Portable media players support different codecs, and some people may not have the required player software installed to watch your video podcast on their computer.

    We'll talk about these issues a little later in the chapter. To begin with, let's start with the simplest case. We'll assume that you're producing an audio podcast, and for maximum compatibility you're using the granddaddy of all podcast formats, the MP3 file.
  • Oct 7, 2008

    File Formats : MP3, QuickTime, Windows Media, RealMedia, Audible

    Knowing about codecs is important, because the codec determines the final quality of the podcast. However, you also have the problem of file formats. The file format dictates how the audio and video information is packaged. Some codecs can fit into a number of different file formats. The problem is that most proprietary systems such as QuickTime, Windows Media, and Real use proprietary file formats to hold the encoded audio and/or video information. File formats are highly guarded trade secrets and the main cause why files are not interoperable between players. However, the file formats also enable the proprietary systems to offer additional functionality. These are the most common file formats you'll encounter:

  • MP3: MP3 is actually a codec, not a file format. The file format is actually MPEG. (MP3 stands for MPEG II, Layer 3.) MPEG files are almost universally playable and the reason most folks use MP3 to encode and distribute their podcasts.

  • QuickTime: The QuickTime file format is the earliest multimedia file format and the basis of the MPEG 4 file format. QuickTime files also are almost universally playable, though the codecs inside may not be.

  • Windows Media: The Microsoft standard, Windows Media obviously plays back on any PC and a large number of portable media devices, but not the iPod.

  • RealMedia: RealNetworks' file format, this requires the RealPlayer. It is supported on many cell phones.

  • Audible: The Audible file format was designed specifically for audio books, and consequently supports saved playback position, chapter marks, book marks, and other desirable features. Because it has been around for quite some time, it has lots of support in proprietary players and portable media players.

  • As mentioned earlier, your choice of codec and file format depend on what audience you're trying to reach and what features you want to use. For most folks, MP3 works well for audio because it plays on virtually every computer system and portable audio player. It doesn't support book marks or chapters, but most podcasts are short enough that they don't require that functionality.

    For video, most folks are using QuickTime, again because of the near-universal support and because of the iPod, of course. However, some people are starting to experiment with other formats to see what they can do with the advanced functionalities they offer. Always offer the MP3 and QuickTime versions. If you want to play around with the advanced formats, offer them in addition to your standard podcast versions.

    Sep 25, 2008

    Video Codecs

    Video codecs are much easier to distinguish from each other than audio codecs. Video quality is still on the rise, with each new codec release improving quality. However, most video codecs are proprietary, meaning they won't play back on other players. If you're embedding your podcast in a Web page, then you can do all sorts of checking via JavaScript to see whether people have a certain plug-in installed. If you're targeting portable media players, then you'll have to be careful about which video codec you choose.

    H.264: Part of the MPEG4 standard, H.264 is quite an improvement and, what's more important, is the video standard supported by the iPod.

    Windows Media Video (WMV): Currently at version 11, the WMV codec provides outstanding quality, along with lots of advanced functionality. It is also supported by the "Plays for Sure" family of portable media players.

    RealVideo (RV): Recently voted the best video codec by author Jan Ozer. RV provides lots of advanced functionality, but is not supported on the iPod or the "Plays for Sure" devices. However, RV is supported on a number of cell phones.

    OGG Theora: The sister project to the Ogg Vorbis audio codec project. Theora videos play back in a number of open source players, as well as the RealPlayer and QuickTime, though they require the installation of an additional component.

    Sep 19, 2008

    Audio Codecs : Speech optimized codecs & Music optimized codecs

    Now that you know how codecs work, it's time to see what codecs are available to podcasters, how they differ, and why you might want to use them. The first thing to consider is whether you're planning on using music in your podcast. If you are, then you definitely want to use a codec that is suited for music. If your podcast is just speech, then you may want to consider using a speech codec, because you'll be able to get very good quality at ridiculously low bit rates, thereby saving you money on bandwidth.

    Choosing a codec is a tricky business. Newer codecs offer better quality, and some offer advanced functionality such as book marking and embedding images. However, many of the newer codecs play back only on a limited number of portable devices. If you want the latest and greatest features, but also want to cater to the widest possible audience, you may want to consider encoding to multiple formats.

    Music-optimized codecs
    As mentioned previously, if you're going to include any music at all in your podcast, you must use a music codec. Luckily, you're spoiled for choice. Here's a list of possible candidates:

  • MP3: The granddaddy of them all. MP3 wasn't initially designed as a low bit rate codec, so other codecs sound much better at low bit rates. It also does not support book marking. But just about every computer and portable media device in the world will play back an MP3 file.

  • Advanced Audio Coding (AAC): The new and improved MPEG audio codec, meant to replace MP3. The only problem is that it isn't supported on some portable players. AAC enables advanced features such as book marking and embedded images.

  • Windows Media Audio (WMA): The standard on Microsoft PCs. It has many advanced features such as markers, script commands, and embedded links. WMA is not supported on iPods, though it is supported on the "Plays for Sure" family of portable media devices.

  • RealAudio (RA): The default audio codec of the RealPlayer, which offers embedded links and script commands. It is supported on a number of cell phones.

  • OGG Vorbis: An open source audio codec offering excellent quality. Unfortunately, Vorbis isn't supported by many of the proprietary players, nor by the iPod.

  • Speech-optimized codecs
    If your podcast doesn't include music, you should consider using a speech codec. They provide better quality at the same bit rate as a music codec, or the same quality at a reduced bit rate.

  • Audible Audio (AA): Developed for the first portable digital media player, which was released by Audible and designed to play back audio books. AA supports a number of advanced features such as book marking. Unfortunately, Audible doesn't make an AA encoder publicly available.

  • The granddaddy of voice codecs. In fact, the AA format is based on the codec. is supported by both the Windows Media and Real players.

  • OGG Speex: Another branch of the OGG open source project, specializing in low bit rate speech compression.

  • Windows Media Audio Voice Codec: During Windows Media encoding you can specify that you're encoding voice content, and the Windows Media encoder will use a voiceoptimized codec.
  • Sep 13, 2008

    How video codecs work

    Video codecs also have improved dramatically. The challenge of encoding video, however, is orders of magnitude more difficult than encoding audio. We found that a minute of CD-quality audio is about 10 MB before it is encoded. That's nothing compared to video. If the video is being digitized in the RGB color space, each pixel uses 24 bits (8 for red, 8 for green, and 8 for blue). So that means a frame of video uses:

    720 lines * 486 pixels * 24 bits/pixel = 8,398,080 bits =
    1,049,760 bytes
    = 1MB per frame of video

    To get the file size for a 20-minute podcast, we remember that there are 30 frames per second, so:

    1MB * 30 frames * 60 seconds * 20 minutes = 36000MB = 35.15 GB

    Yes, you read that right. A 20-minute podcast can chew up an entire hard drive, or at least a good chunk of one. Of course, the preceding calculations assumed uncompressed RGB video, and most podcasts are done using a DV camera. Because DV video is compressed at a 5:1 ratio, you're only looking at around 7 GB for your 20-minute podcast. But imagine downloading a 7 GB file! That's not going to happen in a flash. It's going to take a good long time.

    So the first thing we have to consider is reducing the resolution of the video so there are fewer pixels to encode in each frame. If you resize down to 320×240, you've reduced the file size by 75 percent. You also can cut the frame rate in half for further data reduction. But it turns out that this is still nowhere near the amount of reduction required to be able to deliver this video reliably and in an acceptable amount of time (and without breaking your bandwidth budget). To do this, video codecs rely on perceptual coding, using inter-frame and intra-frame encoding.

    Intra-frame encoding is encoding each frame individually, just as you would when you shrink an image using a JPEG codec. Inter-frame encoding is a more sophisticated approach that looks at the difference between frames and encodes only what has changed from one frame to the next. This is illustrated in Figure 1.

    Figure 1: Inter-frame compression encodes only the differences between frames.

    To be able to encode the difference between frames, the codec starts off by encoding a full frame of video. This full frame is known as a key frame. After the key frame, a number of difference frames are encoded. Difference frames, unsurprisingly, encode only what has changed from the previous frame to the current frame. The codec encodes a number of difference frames either until a scene change or when the amount of change in the frame crosses a predetermined threshold. The sequence of key frames and difference frames is illustrated in Figure 2.

    Figure 2: Inter-frame compression uses a sequence of key frames and difference frames.

    The combination of reduced screen resolutions, frame rates, intra-frame compression, and interframe compression is sufficient to create satisfactory video experiences at amazingly low bit rates. Although no one would want to pay to watch it, you can create video files at bit rates as low as 32 Kbps. Of course, we recommend using ten times that much for your video podcast. At 300 Kbps and above, you can deliver an entirely satisfactory video experience. It won't be perfect, but it should be more than adequate.

    Codec side effects
    No codecs are perfect. Even when codecs claim to be transparent, an expert somewhere can tell the difference. At higher bit rates, the differences between the original and the encoded version are minimal. As the bit rate decreases, however, the differences become easy to spot. Perceptual codecs attempt to remove things that we won't notice, but unfortunately they're not always successful.

    Because so much information must be removed from files, you get less of everything in the encoded version of your file. The frequency range is reduced, as well as the dynamic range. If you're encoding video, you have a smaller screen resolution and possibly a decreased frame rate. If that's not enough, you also see or hear artifacts in your podcast.

    Artifacts are things that weren't in the original file. In encoded audio files, artifacts can be heard as low rumbling noises, pops, clicks, and what is known as "pre-echo," which gives speech content a lisping quality. For video files, you may notice blocking artifacts, where the video is broken up into blocks that move around the screen. You also may see smearing, where the video image looks muddy and lacks detail.

    If your podcast has audible or visible artifacts, you should check your encoding settings. Audio podcasts in particular should not have artifacts; you should be more than capable of producing a high quality audio podcast. Video, however, is a different matter. If you're delivering a 320×240 video podcast encoded at 300 Kbps, chances are good that you'll encounter a few artifacts. They shouldn't interfere with the ability to enjoy your podcast. If they do, you'll need to revisit your equipment or your shooting and editing style, or simply encode your video podcast at a higher bit rate.

    Sep 5, 2008

    How perceptual codecs & audio codecs work

    Perceptual codecs take advantage of how we actually perceive audio and video, and use this information to make intelligent decisions about what information can safely be discarded. Perceptual codecs are by definition lossy because of this. The original cannot be recreated from the encoded file. Instead, an approximation that attempts to retain as much fidelity as possible is constructed. The idea is that we won't notice what has been discarded.

    Our ears are extremely sensitive. We can hear from 20Hz to 20,000Hz and sounds over a wide dynamic range, from a whisper to a scream. We can pick out a conversation at the next table in a crowded restaurant if the topic happens to catch our ear. We can do this because our brains filter out the information that is not of interest and focus on the rest. Our brains effectively prioritize incoming sound information.

    For example, even a quiet classroom has plenty of sounds, such as the hum of air conditioning, people shuffling papers, and the teacher lecturing at the front. If someone sneezes in the room, for that split second, everyone notices the sneeze and nothing else. The sneeze is the loudest thing in the room and takes precedence over everything else.

    Similarly, our eyes can take in a wide range of visual information, the entire color spectrum from red all the way through purple, and from very dim environments to very bright environments. Our field of vision is approximately 180 degrees from left to right. What we actually pay attention to, though, is much more focused. In general, we pay more attention to things that are brightly colored and things that are moving.

    Perceptual codecs use this information to make better decisions about what information in audio and video files can be discarded or encoded with less detail. Perceptual codecs prioritize the loudest frequencies in an audio file, knowing that's what our ears pay most attention to. When encoding video, perceptual codecs prioritize bright colors and any motion in the frame.

    At higher bit rates, perceptual codecs are extremely effective. A 128 kbps MP3 file is considered to be the same apparent quality as a CD and is only one-tenth the size of the original, which is pretty incredible if you think about it. Some of the savings is encoding efficiency, but the majority of it is perceptual encoding. As the bit rate is lowered and the codec is forced to discard more and more of the original information, the fidelity is reduced and the effects of perceptual encoding are more audible. Still, you should always balance the required fidelity of your podcast with the realities of bandwidth and throughput.

    How audio codecs work
    Audio codec technology has made spectacular advances in the last few years. It's now possible for FM quality to be encoded in as little as 32 kbps (in mono, that is). Modern codecs such as Windows Media, Real, and QuickTime AAC can achieve CD quality in approximately 64 Kbps. How do they do it?

    The idea is to capture as much of the frequency and dynamic range as possible and to capture the entire stereo image. However, given the target bit rate, the codec usually determines what a reasonable frequency range is. Files that are encoded in mono are always slightly higher fidelity, because the encoder worries about only one channel, not two.

    Another economy can be made if the codec knows that it will be encoding speech. Speech tends to stay in a very limited frequency and dynamic range. If someone is talking, it's unlikely that her voice will suddenly drop down an octave, or that she'll start screaming for no reason. Knowing this, a codec can take shortcuts when encoding the frequency and dynamics information.

    Caution Don't try to encode music using a speech codec. The shortcuts a speech codec uses are totally unsuitable for music, because music uses a very wide frequency range and is generally very dynamic. If you encode using a speech codec, it sounds awful. So don't do it.

    After the frequency range has been determined, the codec must somehow squeeze as much information as possible into the encoded file and decide what can be discarded. Perceptual audio codecs use the concept of masking to help make that decision. If one frequency is very loud, it masks other frequencies, so the codec can safely discard them because we wouldn't perceive them.

    This is why all background noise must be minimized in your original recordings and your programming must be nice and loud. This ensures that the codec doesn't discard any of the programming information.

    Aug 28, 2008

    Codecs Overview : How codecs work

    We know that encoded files are much smaller than raw media files; the question is how do encoders achieve this file size reduction, and why does the quality suffer?

    At the heart of all encoding software lies the codec. Codec is a contraction of coder-decoder (or compressor-decompressor), and is the software algorithm that determines how to shrink a file to a usable size. You're probably already familiar with a number of codecs, though you may not be aware of it. For example, most digital cameras take pictures that are compressed with the JPEG codec. If you've ever used a photo-editing program to reduce the size or quality of your photos before you put them online, you've been adjusting the parameters of the JPEG codec. StuffIt and WinZip use codecs to compress files before they're sent across the Internet or put on installation CDs.

    There's a key difference, however, between the JPEG codec used to compress photos and the codecs used to compress documents. Codecs used to compress documents must be lossless. If someone sends you a spreadsheet that has been compressed, when it de-compresses the data must be exactly the same as it was before the compression. Codecs such as JPEG, however, are known as lossy codecs, because some of the original information is lost during the compression. The original cannot be recreated from the compressed version of the file. Lossy codecs operate under the assumption that the quality lost either is not noticed by the end user or is an acceptable compromise required for the situation.

    Web sites are a perfect example. Having lots of imagery on a Web site is great, but if the images were all 5 MB originals, each page would take forever to load. Because browsing the Internet should be a rapid, seamless experience, and because we sit so close to our monitors, the amount of detail required in a Web site image is much less than what is required for a printed page, so the image can be compressed heavily using the JPEG codec, and our experience isn't overly compromised.

    The same holds true for podcasts. While it might be nice to have 256 kbps CD-quality podcasts, the reality is 128 kbps offers more than enough quality, and in fact 64 kbps might be plenty, particularly if you're not using the MP3 codec. As you reduce the bit rate of your podcast, the quality is also reduced, because the codec must delete lots more information.

    Codecs try to maintain as much fidelity as possible during the encoding process, but at low bit rates something has to give. There simply isn't enough data to reproduce the original high fidelity. Given the complexity of the task, they actually do an amazing job. They're able to do as well as they do because they make use of perceptual models that help them determine what we perceive as opposed to what we hear. The difference is subtle, but key to modern codec efficiency. Before we talk about perceptual encoding techniques, let's talk a bit about basic codec technologies.

    How codecs work
    Codecs reduce file sizes by taking advantage of the repeated information in digital files. Lots of information is repeated. For example, a video that has been letterboxed (black stripes on the top and bottom) has lots and lots of black pixels. This results in lots and lots of zeros, all in a row. Instead of storing thousands of zeros, you could store "1000 × 0," which is only six characters. That's a significant savings. Also, you can reconstruct an exact copy of the original based on the information that you have stored.

    Another way of encoding is to substitute for commonly occurring combinations of characters. For example, you could make this book smaller by replacing every instance of the word "podcasting" with "p." This wouldn't save that much space, though, and that's the problem with lossless encoding. You can achieve some file size reduction, but typically not enough for our needs. For this, you need perceptual encoding.

    Aug 22, 2008

    Encoding : Throughput & Quality equivalents

    As mentioned in the previous section, throughput is the measure of the amount of bandwidth you use over time. You'll encounter throughput when you use a service to distribute your podcast, because most offer a certain amount of throughput for free and bill you for any used in excess of that. Obviously, you want to keep your monthly bill as low as possible, so you want to try to limit the amount of throughput you use.

    When you encode your podcast, you want to balance the desire to provide the highest quality possible with the reality of your throughput bill at the end of the month. Many podcast distribution services offer generous amounts of free throughput each month, so this may not be an issue when you first start out. If your podcast becomes wildly popular, though, you may be faced with a need to cut your operating costs (until that first sponsor or advertiser comes around, of course). If so, you may want to consider reducing the bit rate of your podcast, which reduces the quality of your podcast, but that may not be noticeable to your audience. Remember, most people listen to podcasts while sitting in front of their computers, and multimedia speakers aren't renowned for their quality. What you want to deliver is a podcast quality that is equivalent to other broadcast media, which in the case of AM and FM radio isn't that high to begin with.

    Quality equivalents
    The concept of broadcast quality to mean really, really good. However, anyone who has listened to AM radio knows that it doesn't sound anywhere near as good as FM, and for that matter FM radio doesn't sound as good as CDs. Yet they're both broadcast standards, and we still listen to radio, even AM. Different types of programming do not need as much fidelity as others.

    The idea, then, is to figure out how much fidelity your programming requires and produce content to that standard. When recording the content, you should always record at a very high standard, because that gives you the most flexibility later on. But when it comes time to encode your content for Internet distribution, you may want to sacrifice a bit of quality for the cost savings it provides.

    Table 1 lists some common bit rates offered by encoding software and brief descriptions of what quality you can expect using different encoding technologies.

    Note In Table 1, you should notice that MP3 audio quality is always slightly worse than Windows Media, Real, and QuickTime AAC, particularly at low bit rates. This is because the MP3 codec is older and wasn't really designed for low bit rate encoding. At higher bit rates (128 Kbps and above), the quality differential is less apparent.

    Aug 12, 2008

    Why Encoding Is Necessary - Bandwidth

    You've spent countless hours and quite possibly a sizeable sum of money to produce a broadcast-quality podcast. Now you're being asked to take the polished result and convert it to a different format, which may compromise the quality of the original. Why?

    The simple answer is because the raw audio and video files are too large to deliver practically via the Internet. There's no technical reason you can't deliver the original files — but it would take an incredibly long time for the files to download, and your monthly delivery bill would be sky high. To better understand the practical limitations involved, you must understand the concepts of bandwidth and throughput.

    Bandwidth, in the networking sense of the word, is a measurement of the amount of data that is being transmitted at any given point. Throughput is the aggregate amount of bandwidth that has been used over a given time period. Think about water coming out of a faucet: The water can come out slowly or quickly depending on how much you open the tap. A gallon jug fills slowly or quickly depending on how fast the water is coming out. The "bandwidth" of the faucet is the speed of the water coming out; the "throughput" is the total amount of water that comes out.

    In podcasting, we come across bandwidth and throughput in a number of different areas. First, each of your potential audience members is connected to the Internet in some way, and that connection has an advertised bandwidth. If they're on DSL or cable modem, they may have a download bandwidth somewhere between 256 kilobits per second (kbps) to several megabits per second (mbps). Similarly, when you upload your podcast to a server or distribution service, you're using bandwidth, but you're uploading, not downloading. The upload or upstream speed of DSL and cable modems is usually far less than the download speed. Regardless of which direction the data is traveling, the bandwidth available determines the speed at which the transfer takes place.

    Let's say you've recorded a 20-minute audio podcast. If you've recorded at CD quality, you recorded in stereo, sampling at 44.1kHz, using 16 bits per sample. We can determine how large this file is using some simple math:

    44,100 samples/sec * 16 bits/sample * 2 channels = 1,411,200
    1,411,200 bits/sec / 8 bits/byte = 176,400 bytes/second
    176,400 bytes/second / 1024 = 172.3 kilobytes per second (KBps)
    172.3 KB/sec * 60 secs/min * 20 min = 206,718.75 KB
    206,718.75 / 1024 = Approximately 202 megabytes (MB)

    So the raw file is over 200 megabytes. (In fact, you can do this math much more quickly: One minute of stereo CD audio is approximately 10 MB, so 20 * 10 = ∼200 MB.) Let's assume one of your audience members is on a fairly standard DSL line, with a download speed of approximately 500 Kbps. You can calculate the download time with a bit of math. All you have to do is convert the file size from megabytes into kilobits, and then divide by the download speed:

    200MB * 8 bits/byte = 1600 megabits
    1,600 megabits * 1024 = 1,638,400 kilobits
    1,638,400 kilobits / 500 kbps = 3,266 seconds
    3,266 seconds / 60 seconds/minute = 54.6 minutes

    So your podcast would take just under an hour to download. If the person is downloading in the background, this might not be too much of a problem, but chances are he's checking e-mail, surfing the Web, and doing other things on his computer that might further constrict the available bandwidth, which in turn makes the download take even longer. Additionally, he may not be getting the full bandwidth that he's paying for (see the "Why Does My Broadband Connection Seem Slow?" sidebar). Overall, this is not an optimal experience.

    What we want to do is deliver a high-quality podcast that doesn't take hours to download. Encoding software enables us to do precisely this. For example, if we encode the file using an MP3 codec, we can achieve CD quality using only 128 kbps. In this case, our file would be:

    128 kbits/sec * 60 seconds/minute * 20 minutes = 153,600 kilobits
    153,600 kbits / 8 bits/byte = 19,200 kbytes
    19,200 kbytes / 1024 = 18.75 MB

    Our file size is less than ten percent of what it was before, and the download time is therefore reduced to about five minutes, which is much more like it. And because each of your listeners is downloading a smaller file, you use much less throughput.

    Aug 5, 2008

    Advanced Video Production Techniques - Adding Titles

    Most professional video programming has some sort of opening sequence that usually includes lots of candid footage mixed with shots of the star(s) and some sort of graphic rendition of the title of the program. You should take the same approach. If your show has a name, let folks know about it! If they download it to their iPod and forget about it until it magically appears on their screen one day when they're browsing through their clips library, you want them to know the name of the program and who you are. So you'll probably want to use titles.

    However, the problem is that what looks good (and is legible) on a television screen in general ends up way too small to be read on a small 320×240 screen. Titles at the bottom of the screen (called lower thirds) can be very hard to read if they're not done with large enough fonts. PowerPoint slides are particularly tough, because most people try to pack far too much information into a single slide, which makes it difficult for people to absorb, and the small fonts become very hard to read when reduced. To top it all off, video codecs have a tough time with text, because they don't treat it as being distinct from the video. So when your podcast is encoded, you're going to lose even more quality, as depicted in Figure 10.9.

    Figure 1: PowerPoint slides are a good example of why text is tough: (a) Scaled to 320×240 and (b) after encoding at 300 Kbps.

    The PowerPoint slide in Figure 1 isn't too bad to start off with; it has only five main points on the slide. By the time the slide is reduced to 320×240, the sub-points are too hard to read, and after the encoding process, even the main points are starting to look a little ragged.

    If you're going to use text in your podcast, think big. Try not to have more than three or four points per slide if you're using PowerPoint, and if you're adding titles to your show and/or your guests, make sure to use a font large enough so that it is legible after the encoding process.

    Jul 30, 2008

    Advanced Video Production Techniques : Inserting Virtual Backgrounds Using Chroma Key

    If you've ever wondered how your local weatherman manages to stand in front of huge swirling weather maps, the answer is using a technology known as chroma key (also called green screen or blue screen). The weatherman isn't actually standing in front of those pictures. He's standing in front of a blank wall painted a very bright, unnatural green. Then, using what until recently was incredibly expensive technology, the green background is removed from the video image and replaced with the graphics that you see on television. When the weatherman is looking off to the side, he's actually looking at a small television monitor to figure out where to point.

    Nowadays, chroma key is built into many video-editing platforms. Some require an additional plugin, but others include it as part of the basic functionality. To use this feature, however, you have to film yourself (or whoever your subject is) in front of a green (or bright blue) wall. The trick is to make sure the wall color is very uniform and is lit in such a way that there are no shadows on the wall. You can buy custom paint that the professionals use to paint their chroma key walls, or if you're budget constrained, you can buy a roll of bright green butcher paper at your local art supply store.

    You need a large area to film against, because you have to stand far enough away from the green screen so that you don't cast any shadows on it. Lighting for a green screen shoot is an art form in itself. This is a good example of where calling in a professional to help you out is a great idea. After you've got a lighting setup established, you can reuse it for future shoots.

    After you've shot your video against the green screen, the process for substituting the background depends on your video-editing software. Figure 1 illustrates the chroma key effect from Vegas. After you've specified what color to use as the key for the chroma key effect, that color is removed from the frame and another image or video is substituted where the key was.

    Figure 1: Vegas chroma key effect

    Tip One good reason to use chroma key is that the backgrounds are generally static, and as we'll find out later, static backgrounds encode best. Conversely, don't use backgrounds with motion in them if they can be avoided.

    Most video-editing programs deal with video in terms of tracks, so when the chroma key effect is used, the video track beneath the main track is revealed. This is how the weatherman appears to be standing in front of the weather maps. In actuality, the weather maps are just showing through where the original chroma key color was.

    Jul 18, 2008

    Cropping and Resizing

    At some point, you may want to cut out part of your video image. For example, the original video may not have been framed well, and you may want a tighter shot. Or there may be something objectionable in the shot that you want to remove. Cutting out this unwanted video is called cropping. If you're targeting video iPods, you also have to resize your video to 320×240, which is the resolution of the iPod video screen. Video-editing platforms allow you to do this, but in order to do it correctly without introducing any visual distortion, you must understand what an aspect ratio is.

    Aspect ratios
    The aspect ratio is the ratio of the width to the height of a video image. Standard definition television has an aspect ratio of 4:3 (or 1.33:1). High definition TV has an aspect ratio of 16:9 (1.78:1). You've no doubt noticed that all the new HDTV-compatible screens are wider than standard TVs. When you're cropping and resizing video, it's critical to maintain your aspect ratio; otherwise, you'll stretch the video in one direction or another.

    To better understand this, let's look at how NTSC video is digitized. The original signal is displayed on 486 lines. Each one of these lines, or rasters, is a "stripe" of continuous video information. When it is digitized, it is divided up into 720 discrete slices, each one of these slices is assigned a value, and the values are stored digitally.

    However, when you display the digitized 720×486 video on a computer monitor, the video appears slightly wider than on a television, or looked at another way, the video seems a bit squished. People look a little shorter and stickier than usual, which in general is not a good thing. Why is this?

    If you do the math, 720×486 is not a 4:3 aspect ratio. If you could zoom in and look really closely at the tiny slices of NTSC video that were digitized, they would be slightly taller than they are wide. But computer monitor pixels are square. So when 720×486 video is displayed on a computer monitor, it appears stretched horizontally. To make the video look right, you must resize the video to a 4:3 aspect ratio such as 640×480 or 320×240. This restores the original aspect ratio, and the image looks right.

    Note Those of you paying attention may be wondering about standard definition television displayed on the new widescreen models. The simple answer is that most widescreen TVs stretch standard television out to fill the entire 16:9 screen, introducing ridiculous amounts of distortion. Why that is considered an improvement is anyone's guess.

    With the availability of HDV cameras, some of you may be fortunate enough to be working in HDV, which offers a native widescreen format. If so, you'll be working with a 16:9 aspect ratio such as 1080×720 or 1920×1080. Regardless of the format you're working in, the key is to maintain your aspect ratio.

    If you decide you need to do some cropping, the key is to crop a little off each side to maintain your aspect ratio (see Figure 1). Some video-editing platforms offer to maintain the aspect ratio automatically when you're performing a crop, which is very handy. However, many of the encoding tools require that you manually specify the number of pixels you want shaved from the top, bottom, right, and left of your screen. If that's the case, then you have to do the math yourself and be sure to crop the right amounts from each side.

    Figure 1: Be careful with your aspect ratio when you crop.

    As an example, let's say that you needed to shave off the bottom edge of your video. You could estimate that you wanted to crop off the bottom 5 percent of your screen, which would mean 24 lines of video. Assuming you were working with broadcast video, to maintain your aspect ratio, you'd need to crop a total of:

    24 * 720 / 486 = 35.5 or 36 pixels

    So you'd need to cut 36 pixels off the width to maintain your aspect ratio. You could do this by taking 18 pixels off either side, or 36 pixels off one side. It doesn't matter; where you crop is dependent on what is in your video frame. Of course, this is assuming that you're working with NTSC video. The math varies slightly if you've already resized the video to a 4:3 aspect ratio such as 640×480.

    One thing to bear in mind is that some codecs have limitations on the dimensions they can encode. Codecs divide the video frame into small boxes known as macroblocks. In some cases, the smallest macroblock allowed is 16×16 pixels, which means that your video dimensions must be divisible by 16. Most modern codecs allow macroblocks to be 8×8 pixels, or even 4×4 pixels. The great thing about 320×240 is that it works for even the largest macroblocks.

    Resizing is pretty easy; just make sure you're resizing to the correct aspect ratio.

    Jul 8, 2008

    Video Signal Processing : Using deinterlacing filters

    You should have an understanding about why you'd want to do some video signal processing. Even if you've done a great job producing and capturing your video, there are still fundamental differences between television and computer monitor displays that should be compensated for. To do this, you need to de-interlace your video and adjust your color for RGB monitors.

    Using de-interlacing filters

    Most editing platforms have de-interlacing filters built into them. As we saw in Figure 10.1, the problem is dealing with the artifacts that arise when two fields of interlaced video are combined to make a single frame of progressive video. Three methods are commonly used to deal with interlaced video:

  • Blending: This approach combines the two fields, but it's vulnerable to interlacing artifacts, as shown in Figure 10.1.

  • Interpolation: This approach attempts to shift parts of one field left or right to compensate for the artifacts. This is very computationally complex, because only parts of the field should be interpolated. For example, in Figure 10.1, we want to interpolate the parts of the frame that include the moving minivan, but not those that contain static elements such as the trees in the background.

  • Discarding: This approach discards one field and uses a single field twice in a single frame of progressive video. The resulting frame therefore has half the vertical resolution of the original frame, but without the interlacing artifacts.

  • Editing and encoding platforms distinguish themselves by how they deal with interlacing artifacts. De-interlacing video on two different platforms generally yields different quality results. Where you choose to do your de-interlacing depends on where you can get the best quality. If you're staying in the broadcast world for your editing phase, it makes more sense to de-interlace during the encoding phase. This is demonstrated for you in the next section.

    However, we have to come clean about de-interlacing. For the most part, it isn't necessary for most podcasts. If you're encoding your podcasts for viewing on a video iPod (or other portable media device), chances are good that you're targeting a resolution of 320×240. At this resolution, most encoding software drops the second field by default! If you've got only 320 lines of resolution, it doesn't make sense to process the second field, so you don't have any interlacing artifacts to deal with. This is a very good reason to target 320×240 for your podcasts: The de-interlacing problem goes away.

    If, however, you're targeting browser-based playback for your podcast and decide to use a resolution larger than 320×240 — such as 400×300, 480×360, or 640×480 — you need to de-interlace your video during the encoding phase. So, for you mavericks, the next section shows where to find the de-interlacing filter in a number of software applications.

    Where to find de-interlacing filters
    If you're hoping to de-interlace your video (assuming that your final video podcast resolution is larger than 320×240), you need to make sure your encoding application has de-interlacing filters. Most, but not all, do. If you're targeting the QuickTime format, use an encoding application such as Sorenson Squeeze, because QuickTime Pro doesn't include a de-interlacing filter.

    Sorenson Squeeze includes a de-interlacing filter in the filter settings window, shown in Figure 1. Double-click any filter to open the filter settings window. The de-interlacing filter is on by default in the preset filters.

    Figure 1: Sorenson Squeeze offers de-interlacing in the filter settings window.

    If you're targeting the Windows Media Format, you can use the de-interlacing filter included in the Windows Media Encoder. The de-interlacing filter is on the Processing tab of the Session Properties window, shown in Figure 2.

    Figure 2: The Windows Media Encoder offers a de-interlacing filter in the processing settings.

    If your encoding application doesn't have a de-interlacing filter, chances are good that your editing platform will. Vegas includes the de-interlace setting in the Project Properties window, shown in Figure 3. Select Project Properties from the File menu or type Alt+Enter, and then select the deinterlacing method you want from the drop-down menu.

    Figure 3: Vegas offers a de-interlacing filter in the project properties window.

    Jul 5, 2008

    Display Technology Differences

    Television screens display images using a completely different technology than computer monitors. This is unfortunate because it leads to problems when trying to display video on a computer screen. However, it also can be a blessing, because television technology is nearly 100 years old, and much better technology is now available. The problem is that for the foreseeable future, we're caught between the two, shooting with cameras that are designed to record video in the NTSC/PAL (television) standard, and distributing our video on the Internet to be viewed on modern displays.

    Interlaced versus progressive displays
    Each frame of video is divided into two fields, one consisting of the odd lines of the image and the other the even lines. These two fields are scanned and displayed in series. So television actually is 60 fields per second, which we see as continual motion.

    Computer monitors, whether they're cathode ray tube (CRT) or liquid crystal display (LCD), are progressive monitors. Each frame of video is drawn from left to right, top to bottom. There are no fields. The problems appear when we try to create a single frame of progressive video from two fields of interlaced video (see Figure 1).

    Figure 1: Converting two fields of interlaced video with significant horizontal motion to a single frame of progressive video can be problematic.

    In Figure 1, a minivan is driving past the camera. During the split second between the first and second field scans, the minivan has moved across the frame. When this video is displayed on an interlaced display, it appears normal, because the second field is displayed a split second after the first. However, if we try to combine these two fields of interlaced video into a single progressive frame, interlacing artifacts appear because of the horizontal motion. The front edge of the minivan is "feathered," and both tires are a blur. At either the editing or the encoding phase, something must be done to deal with this problem.

    Color spaces
    Television and computer monitors encode color information differently. Television signals are encoded in terms of luminance and chrominance (YUV encoding); computer monitor signals are encoded in terms of the amount of red, blue, and green in each pixel (RGB encoding). We also watch them in different environments. Televisions are generally viewed in somewhat dim surroundings, whereas computer monitors are generally in bright areas. The combination of these factors means that content created for one environment doesn't look right when displayed on the other.

    Digitized NTSC video looks dull and washed out when displayed on a computer monitor. Video that has been processed to look right on a computer monitor looks too bright and saturated (colorful) when displayed on an NTSC monitor. The non-compatibility between the two display technologies makes it problematic to create high-quality video, particularly if you want to display your content on both. If you're producing content for both broadcast and the Internet, at some point your project must split into two separate projects. After you start processing a video signal for display on a computer monitor, you won't be able to display it on a TV monitor.

    Tip The best way to manage this issue is to work exclusively in the broadcast space during your digitizing and editing phases. Archive your masters in a broadcast format. Don't do your post-processing for Internet viewing until the encoding phase, or at least after all your editing has been done and you have a broadcast-quality master. That way, you always have a version of your video that can be broadcast or burned to DVD. Create a special version that is intended for Internet-only consumption. As new formats evolve, you can always re-encode from your broadcast-quality master.

    Jul 2, 2008

    Advanced Video Production Techniques

    So you've figured out how to shoot some video and managed to load it into your computer. It looks good, but something's not quite right. The video just isn't quite as bright and colorful as you remember. That's because there are fundamental differences between televisions and computer monitors.

    Before we dive into the technical minutiae of display technologies, let's talk briefly about some simple tools you can use to improve your video image before it hits tape. Lens filters can be a very cost-effective way of improving your video quality.

    Understanding Lens Filters

    Many of you may have at one time or another played around with photography. If you ever progressed beyond "point-and-shoot" cameras, one of the first accessories you probably purchased was an ultraviolet (UV) filter for your lens. UV filters are useful because they prevent UV light from entering your lens, which can make your pictures look slightly blurry, and because they protect your lens. Mistakenly scratching a $25 filter is far preferable to scratching a fancy zoom lens that cost you hundreds of dollars.

    The same applies for your DV camera. Protecting your investment by buying a cheap and replaceable filter is a good idea, and as with photography, filtering out UV light gives you a cleaner video image. If you're wondering how a UV filter works, chances are good that you've experienced it many times. Every time you put on a pair of sunglasses, you're filtering out UV light (among other things). The immediate effect is a clearer, crisper image. Even though we can't see UV light, it interferes with our ability to perceive visible light. DV cameras have the same problem, so a UV filter is always a good idea.

    A number of other lens filters can be used to improve the quality of your podcast. The next few sections discuss them generally. To learn more about exactly which filter you should use with your model of camera, you should consult online discussion boards and digital video camera review sites.

    Diffusion filters soften your video image. We've all seen diffusion at work in the movies, particularly in the film noir genre. The camera cuts to a shot of the gorgeous female actress, and she's practically luminous. This is achieved using a fairly heavy diffusion filter. Although this would be overkill for most podcasting applications, using a light diffusion filter gives your podcast a distinctive look. It also can help your encoding quality.

    Many DV cameras default to shooting a very high contrast image, and some even use special processing to exaggerate the edges between objects. This can be okay in situations where there isn't much contrast to begin with, but if your scene is lit properly, you should have plenty of contrast. Video that has too much contrast looks amateurish. Using a diffusion filter can mitigate this by softening the entire image ever so slightly. Diffusion filters can make your podcast look more "filmlike," which is generally desirable.

    Video with too much contrast also is more difficult to encode, because it has lots of extra detail in the frame. This makes the encoder's job harder, because it tries to maintain as much detail as possible. Using a diffusion filter helps soften the image slightly, which reduces the amount of detail, thereby making the image easier to encode.

    Color correction
    The UV filter described at the beginning of this section is essentially a color filter, designed to filter out colors beyond our range of vision. There are many more color filters that you can buy for other situations. One of the most useful for many podcasters is the fluorescent light filter. Fluorescent lights emit a very particular type of light, with a lot of extra green in it. Because of the large amount of green content, fluorescent lights tend to make people look slightly ill. Using a fluorescent filter when filming in offices or other fluorescent lighting situations can make your podcasts look warmer and more natural.

    You also can buy filters that are designed to enhance certain parts of the color spectrum. These are fairly specialized and not for the average podcast producer. If you're looking for a special effect, you're probably better off trying things out in your video-editing platform, where you can safely undo those mistakes.

    Polarization filters are used to filter out reflected light. For example, filming through a window can be very difficult because of the reflections. Using a polarization filter removes this reflected light and allows you to film what's on the other side of the glass. Similarly, if you're trying to film under water, for example fish in a pond, you need a polarization filter. Polarization is often used in sunglasses for this reason.

    Jun 28, 2008

    Archiving Your Podcast

    As you probably are beginning to realize, quite a bit of work goes into creating a video podcast. If you've got a FireWire setup, it can be pretty simple, but if you're using a video capture card and an analog camera, you may have to fiddle with your settings. Depending on how much editing you do (and how many cutaways you have to use), your final master may be quite a bit different from what you originally started off with. It's very important, therefore, that you archive your work so that you don't have to start from scratch if you decide to re-edit your podcast, perhaps for a "bestof" end-of-year show.

    For that matter, your podcast may not be the only outlet for your programming. You may decide you want to put out a DVD or license your programming to a cable channel. The possibilities are all out there, but if all you keep lying around are the low-bit-rate podcast versions, you'd have to do lots of work to recreate your masters.

    Save your work in as high a quality as you can. If you're working with a FireWire system, you can usually print your master right back to a DV tape. You can obviously keep a DV version on your hard drive if you've got space, but video files can fill up a hard drive quickly. DV tapes are compact and a fairly reliable backup method.

    If you're not working with a FireWire system, or if you just want to keep pure digital copies lying around, consider buying an external hard drive (or two). You can use one to do all your capturing and editing and keep the other for archival purposes. Without the luxury of FireWire, you won't be able to save to DV, because video capture cards don't work in reverse; you can't print your edited master back to tape. You have to rely on digital storage.

    One thing that hasn't been thoroughly established is how long hard drives will last. It's fairly common knowledge that hard drives in servers that are working 24 hours in a day have an average life expectancy of about three years. However, they're usually higher quality drives than most people have in their laptops or home desktop systems. Much like light bulbs, it's the turning on and turning off that are hardest on the drive.

    If you're using external drives, you may not be using them every day, which in theory extends their life cycle, but if you put them on a shelf and forget about them, had drives have been known to "freeze." The data on the disc platters is intact, but the hard drive is unable to spin the platters to access the data. You can send drives in this condition to companies that specialize in data rescue, but the process is very expensive.

    Unfortunately, we have no good answer as to how long hard drives are going to last. Institutions such as banks that rely on data use tape backup systems to maintain their data integrity. A number of pro-sumer tape backup formats are available nowadays. They're not cheap, but if you want a guarantee that your programs will be available 5, 10, or 25 years from now, you should consider investing in a good tape backup system, or open up an account with a backup company.

    Jun 24, 2008

    Tips : Editing Your Podcast

    After you've transferred your video to your computer, you need to tidy up the rough edges of your video production and turn your podcast into a masterpiece. Well, we can hope, can't we? You should edit with an eye on three things: content, quality, and convenience.

    First and foremost, you want your podcast to have good content from start to finish. If you are interviewing a guest for your podcast, you probably had a long list of questions to ask. When you're reviewing your footage, try to keep your distance from the material, and only keep what works best. Of course, some guests may be fantastic, and you'll want to keep every syllable they utter. Often, however, you'll find that a few questions just didn't go anywhere or didn't reveal anything new (see the "Ask the Right Questions" side bar). If so, edit it out. With a few nimble edits, the pacing of a show can change dramatically, turning a mediocre show into a great show.

    Keep edits short and sharp
    When you're editing, most edit platforms offer a number of transition options for you to choose from. In general, anything other than a quick cross fade (also known as a dissolve) should be avoided, for a couple of key reasons. First, if you watch closely, crazy transitions are almost never used on television or in film. Over-the-top transitions detract from the story line and call way too much attention to themselves. For this reason, they're a dead giveaway that an amateur is at the controls. Second, there's a technological reason why you shouldn't use complicated transitions. They are incredibly difficult to encode. If you're encoding for a broadband audience, the bit rates you use simply aren't capable of encoding that much motion efficiently. You'll either end up with a transition that looks like mud, or you'll be forced to encode your podcast at a higher bit rate, which means a larger file, a longer download for your audience, and a bigger bandwidth bill at the end of the month.

    Cutaways are small pieces of film that you can use when editing video, often used to cover up an edit. Editing video can be tricky because people can see when and where you edit your video. You can't just cut out the middle of an interview without some clever editing, or people will notice that there's something missing. This is where cutaways can really help.

    Imagine an interview on a conference floor, where someone rudely interrupts your guest while she's answering a question. Unless the interruption was by someone important (or it was really funny), you probably want to edit it out of the podcast. If you just cut it out, there will be a sudden jump in the video (known as a "jump cut" in the industry). You have to disguise your cut using a cutaway.

    Here's how it works: When the interruption occurs, cut to some b-roll, like a shot of you nodding in agreement or a shot of the conference room floor. Let the audio of your guest's response continue to play underneath the b-roll. Then, you can cut from the b-roll with the guest audio underneath it to an appropriate location after the interruption occurred. The jump cut will be hidden by the b-roll, and your secret will be safe. This editing approach is illustrated in Figure 1.

    Figure 1: Use cutaways to disguise your edits.

    Tip Provided you have plenty of cutaway material, it's often easiest to edit your story together by editing to your audio and then covering any awkward transitions with cutaway or B-roll material.