Chapter 5. Introduction to Streaming Media
Internet streaming media changed the Web as we knew it -- changed it from a static text- and graphics-based medium into a multimedia experience populated by sound and moving pictures. Now streaming media is poised to become the de facto global media broadcasting and distribution standard, incorporating all other media, including television, radio, and film. The low cost, convenience, worldwide reach, and technical simplicity of using one global communications standard makes web broadcasting irresistible to media publishers, broadcasters, corporations, and individuals. Businesses and individuals once denied access to such powerful means of communication are now using the Web to connect with people all over the world.
The remarkable technology that allows a web site visitor to click on
a button and seconds later listen to a sporting event, tradeshow
keynote, or CD-quality music is the result of a rather simple but
powerful technical innovation -- streaming
media. Streaming works by first compressing a digital
audio file and then breaking it into small packets, which are sent,
one after another, over the Internet. When the packets reach their
destination (the requesting user), they are decompressed and
reassembled into a form that can be played by the user's
system. To maintain the illusion of seamless play, the packets are
"buffered" so a number of them are downloaded to the
user's machine before playback. As those buffered or preloaded
packets play, more packets are being downloaded and queued up for
playback. However, when the stream of
packets gets too
slow (due to network congestion), the client audio player has nothing
to play, and you get the all-too-familiar drop-out that every user
has encountered.
5.1. Streaming protocols
The big breakthrough that enabled the streaming revolution was the
adoption of a new Internet protocol called the
User Datagram Protocol
(UDP)and new encoding techniques
that compressed audio files into extremely small packets of data. UDP
made streaming media feasible by transmitting data more efficiently
than previous protocols from the host server over the Internet to the
client player or end listener. More recent protocols such as the
RealTime Streaming
Protocol (RTSP) are making the transmission of data even more
efficient.
UDP and RTSP are ideal for audio broadcasting since they place a high
priority on continuous streaming rather than on absolute document
security. Unlike TCP and HTTP transmission, when a UDP audio
packet drops out, the server keeps sending information, causing only
a brief glitch instead of a huge gap of silence.
TCP, on the other
hand, keeps trying to resend the lost packet before sending anything
further, causing greater delays and breakups in the audio broadcast.
Prior to UDP and RTSP transmission, data was sent over the Web
primarily via TCP and HTTP. TCP transmission, in contrast to UDP and
RTSP transmission, is designed to reliably transfer text documents,
email, and HTML web pages over the Internet while enforcing maximum
reliability and data integrity rather than timeliness. Since HTTP
transmission is based on TCP, it is also not well-suited for
transmitting multimedia presentations that rely on time-based
operation or for large-scale broadcasting.
Later in the chapter, you will learn why protocols are important.
Some streaming technologies such as
RealAudio and
Windows Media
utilize dedicated servers that support superior UDP and RTSP
transmission. Other formats such as
Shockwave, Flash,
MIDI, QuickTime, and
Beatnik are primarily designed to
stream from a standard HTTP web server. While these formats are
cheaper and often easier to use since they do not require the
installation of a new server, they are typically not used in
professional broadcasting situations that require the delivery of
hundreds or thousands of simultaneous streams.
HTTP streaming is thus referred to as
pseudo-streaming,
since technically it is possible to stream via HTTP. But it is much
more likely to cause major packet drop-outs, and it cannot deliver
nearly the same amount of streams as UDP and RTSP transmission.
Herein lies the difference between most low-end solutions and more
professional broadcasting solutions that require dedicated servers
and extra bandwidth and server capacity.
5.1.1. Lossy compression
Regardless of the advances in UDP and RTSP transmission protocols,
streaming media would not be possible without the rapid innovation in
encoding algorithms or codecs that compress and decompress audio and
video data. Uncompressed
audio
files are huge. One minute of playback of a CD-quality stereo audio
file requires 10 MB of data, approximately enough disk space to
capture a small library of books or a 200-page web site.
Standard modem speed connections -- including
cable modems
and xDSL
systems -- do not have the capacity to deliver pure, uncompressed
CD-quality 16-bit, 44.1 kHz audio. In order to stream across the
limited bandwidth of the Web, audio has to be compressed and
optimized with codecs, which are
compression-decompression
encoding algorithms. In general,
compression
schemes can be classified as "lossy" and
"lossless."
Lossy compression schemes reduce file size by
discarding some amount of data during the encoding process before it
is sent over the Internet. Once received on the client side, the
codec attempts to reconstruct the information that was lost or
discarded. The benefit to this sort of compression lies in the
smaller file size that results from discarding the "lost"
information. The JPEG
image format uses lossy compression to sample an image and discard
unnecessary color information. Similarly, lossy audio compression
discards frequencies on the high and low end of the spectrum and
attempts to locate and remove unnecessary audio data. The technique
is often referred to as "perceptual encoding" since the
user is unlikely to notice the absence of this information. Lossy
compression offers file savings on the order of 10:1.
Since small file size is so important on the Internet, practically
all of the formats we're interested in employ lossy
compression. Here's how it works. First, the client player
decompresses the audio file as it downloads to your computer. Then it
fills in the missing information according to the instructions set by
the codec. To illustrate why lossy compression is so crucial,
consider the phrase, "Now is the time for all good men to come
to the aid of their country". One way to compress this would
simply be to remove all the vowels and spaces:
"Nwsthtmfrllgdmntcmtthdfthrcntry".
That cuts the message from 71 characters to 31, a 56% file savings,
but of course our compressed message is unintelligible. Imagine that
our codec, however, has appropriate rules for decompressing this
message with minimal distortion. The conversion likely wouldn't
be perfect, but it would be good enough to understand the message,
something like, "Now's tha ti'm for oll gudm en to
com to the aad of their country".
This is exactly what happens with lossy audio compression. The
compressed file is unintelligible to the listener; the decompressed
file is intelligible but of a lower quality than the original.
For example, a
RealAudio
speech file encoded from a standard
AIFF or WAV
file is
generally one-tenth the size of the original file after encoding. To
reduce that file's size, first you preserve the integrity of
the 1,000 Hz to 4,000 Hz frequency spectrum of the human voice and
then discard the frequencies above and below those ranges. By
eliminating the unnecessary low- and high-end frequencies, the
encoder is able to reduce the file size while maintaining speech
intelligibility. It should be noted that speech tends to have aural
characteristics (sound) that extend into the 7,000 Hz range. When the
area between 4,000 Hz and 7,000 Hz is reduced or removed entirely,
encoded speech will sound intelligible, but it may lose clarity and
sound unnatural. Furthermore, since some voices and sounds often
reach into even higher frequency ranges, lossy compression and
encoding can result in dull, muted, or abrasive sounds.
5.1.2. Lossless compression
In contrast, lossless compression squeezes data
into smaller packets of information without permanently discarding
any of the data. Instead of permanently discarding information,
lossless compression discards it temporarily but provides a
"map" with which the codec can reconstruct the original
file. Lossless compression results in superior audio quality, but
lower compression rates.
In the lossy example, our codec had some general rules for
reconstructing the message -- basically to add vowels and spaces
in order to form English words. It wasn't perfect because it
didn't know which English words to choose, and it wasn't
always sure where one word ended and the next began.
Lossless
codecs,
on the other hand, are perfect. To reconstruct our message perfectly,
however, would mean having a much more sophisticated set of rules. A
lossless text codec would have to reproduce not only words but
sensible phrases. It would have to be able to break words correctly.
And it would have to have a mastery of the English language's
inconsistent spelling patterns. It would in fact be, as the computer
scientists say, a nontrivial endeavor.
The same goes for lossless audio codecs. They are difficult to
develop (and thus expensive to license), they require substantial
computing power on the user's machine, and the file savings are
not as great as with lossy compression. Sadly enough, it appears that
for the current time, lossy compression is necessary for knocking
large audio files down to Internet-appropriate size. The good news is
that lossy compression schemes are becoming more advanced, and over
time the differences will become less and less noticeable to the
human ear.
Now that we have discussed lossy and lossless compression and the
types of protocols that enable the efficient delivery of compact
audio files across the Internet, let's review the audio formats
available on the market. Most of these formats will be discussed in
greater detail in the rest of the book.
 |  |  | | 4.4. Summary |  | 5.2. Streaming media formats |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|