Sound Data

Audible sound consists of variations in air pressure that cover the frequency range of 20 Hz to 20 kHz. The analogue signal produced by a microphone is converted into digital data by using an analogue to digital converter (ADC), the output of which can be manipulated by your computer. After this, the sound is converted back into analogue form by a digital to analogue converter (DAC), which is usually connected to a pair of headphones or an amplifier.

Some computers have an analogue input and an internal ADC, as well as an analogue output connected to the machine’s DAC. However, some models don’t have an input, requiring you to connect an audio adaptor to a spare USB or FireWire port. You can also such adaptors to provide extra inputs or outputs, which is useful when the built-in sound quality isn’t up to your requirements.

  Sampling

The process of converting analogue audio into digital code, as accomplished by an ADC, involves a technique known as sampling. This ‘chops up’ the signal at a speed known as the sample rate. Generally speaking, this rate should be at least twice that of the highest frequency found in the incoming signal. So, if we required a frequency response of up to 20 kHz we would use a sampling rate of 10 kHz or more. In practice, higher rates are desirable, since it’s possible for inaudible upper frequencies to interact, creating subharmonics that you can hear.

Choosing a Sample Rate

As mentioned above, the choice of sample rate is a compromise between sound quality and the amount memory or disk space that’s available. In some instances, compression can be used to dramatically reduce the space required (see below), although this can degrade the sound quality.

The following table shows Apple’s preferred rates for older Mac OS computers and the expected upper frequency response that they provide:-

Rate (Hz)Rate (kHz)Response (kHz)Application
5,563.63635.52Telephone-quality speech
7,418.18187.43Telephone-quality speech
11,127.2727211.15Medium-quality speech
22,254.5454622.210Low-quality music
44.100044.120CD-quality music

Of these, 11.1, 22.2 and 44.1 kHz are often used. Although most Mac OS computers are actually capable of using rates up to 65,535 Hz (65.5 kHz), this particular frequency is rarely used.

The next table shows the rates used in professional systems and other equipment:-

Rate (Hz)Rate (kHz)Max Response (kHz)Application
8,000.000041.5Telephone-quality speech
4,000.000083Telephone-quality speech
11,025.0000115Multimedia CD-ROM speech
22,050.00002210Multimedia CD-ROM music
24,000.00002411Voice recognition systems
32,000.00003215Broadcast links, domestic digital VCRs
44,100.000044.120CD, professional systems, DAT
48,000.00004822Professional systems, DAT
64,000.00006430Special
88,200.000088.240Professional systems
96,000.00009644Professional systems, DVD (high-quality)
176,400.0000176.480Professional systems
192,000.000019288Professional systems

In practice, most people accept CD-quality sound, sampled at 44.1 kHz with 16-bit resolution (see below). This requires about 5 MB of hard disk space for every minute of recorded sound, equivalent to 75 KB per second. Of course, for stereo you must multiply this by two and for a multi-track operation you must multiply this figure by the number of tracks.

Resolution

Having sliced up the incoming signal, the ADC converts the measured signal voltage into a digital code. The accuracy or resolution of this code is set by the number of bits used in the process. Once again, a compromise has to be made between quality and the amount of space used by the data.

Although 8-bit samples only occupy half the disk space of 16-bit samples, the sound quality is often horribly granular. However, some multimedia CD-ROMs use such 8-bit material to conserve space. Very high quality systems, such as ProTools 24 (Digidesign), use 24-bit sampling running at 96 kHz or higher. This minimises noise and distortion but is very demanding on disk space.

Each sound sample is represented within a sound frame, often one of the following types:-

TypeSample (bytes)Arrangement
8-bit mono1Single byte sample
8-bit stereo2Byte 1 = Left  Byte 2 = Right
16-bit mono2Two-byte sample
16-bit stereo4Two bytes for each channel

Audio Codecs

Digitised audio material can be compressed to create smaller sound files, so saving disk space. The software used to compress or decompress a file is known as a coder-decoder or codec. Many are built into Apple’s QuickTime package, while others can be provided as separate files.

The following table shows a number of common coding systems, used with or without compression. Except where indicated, these formats are supported by modern versions of QuickTime.

FormatNotes
24-bit IntegerLinear coding, no compression
32-bit IntegerLinear coding, no compression
32-bit Floating PointLinear coding, no compression
64-bit Floating PointLinear coding, no compression
Adaptive Multi-Rate (AMR) ◊ACELP-based predictive speech coding for GSM/G3PP phones
Advanced Audio Coding (AAC)16-bit psychoacoustic music coding (see below)
ALaw 2:12:1 lossy compression
Code Excited Linear Predictive (CELP)Predictive speech coding
IMA 4:1 *4:1 lossy compression, audible effects
MACE 3:1 •3:1 lossy compression, audible effects
MACE 6:1 •6:1 lossy compression, highly audible effects
MetaSound AC8 †8-bit acoustic coding
MetaSound AC11 †11-bit acoustic coding
MetaSound AC16 †16-bit acoustic coding
MetaSound AC24 †24-bit acoustic coding
MetaVoice RT24 †Speech coding
MP316-bit psychoacoustic music coding (see below)
MS ADPCM +Microsoft standard
QDesign Music 2 ‡Real-time music coding
Qualcomm Code Excited Linear (QCEL)Predictive speech coding
Qualcomm PureVoice9:1 or 19:1 real-time speech coding
VivoActive G273 †Special
VivoActive SIREN †Special
µLaw 2:12:1 lossy compression

 Eight bit rates can be used, from 4.75 to 12.2 kbit/s

* Interactive Multimedia Association, 16-bit data only

 Macintosh Apple Compression and Expansion, 8-bit or 16-bit data

+ Coding not available via QuickTime

 Coding via QuickTime may be possible using extra codec file

 Bit rate of 8, 10, 12, 16, 20, 24, 32, 40 or 48 kbit/s

Not all file formats can support every kind of codec. For example, an AVI file can only use ALaw or µLaw coding, a Mac System 7 sound file only uses ALaw, IMA, MACE or µLaw, a Wave file only uses ADPCM and a µLaw file only accommodates Floating Point, ALaw or µLaw coding. However, any codec can be used inside an AIFF sound file or QuickTime movie file.

Lossy and Psychoacoustic Codecs

Earlier forms of compression, such as IMA or MACE, employ lossy algorithms that remove some of the audio data, resulting in audible distortion and other effects. They’re usually best avoided.

Later codecs, such as AAC, CELP (both part of the MPEG-4 (MP4) standard), MP3, QDesign Music 2 and the Qualcomm codecs, use a lossy form of coding known as psychoacoustic coding or perceptual coding. This exploits the fact that loud sounds at one frequency mask quieter tones at other frequencies. MP3 and AAC are used for music on the Internet.

  MP3, AAC and WMA

Psychoacoustic coding, as described above, has been developed as part of several different file formats, the most common of which are MPEG I Layer-3 (MP3), Advanced Audio Coding (AAC) and Windows Media Audio (WMA). All accommodate a quality that approaches that provided by an audio CD, but with a file size suitable for downloading over the Internet.

MPEG I Layer-3 (MP3)

This format, devised by the Motion Pictures Experts Group (MPEG),the International Standards Organisation (ISO) and the International Electro-technical Commission (IEC), gives better compression than the older MPEG I Layer-1 or Layer-2 systems, reducing audio data to an eleventh of its original size whilst conveying mono or stereo sound sampled at 32, 44.1 or 48 kHz with 16-bit resolution. Although lossy, the perceptual coding provides high compression with only a small loss of quality, whilst Huffman coding is used to further compress the data.

MP3 files can be played on a computer using a MP3 application such as Apple’s iTunes or can be downloaded via a FireWire or USB port to a portable MP3 player, such as an iPod.

Data Rates

The amount of data used is set by the chosen data rate. This can be as high as 128 kbit/s for mono or 384 kbit/s for stereo, although most stereo MP3s are usually encoded at 128 or 96 kbit/s.

The following table shows the reductions in file size obtained by using different data rates:-

ContentSample Rate (kHz)Data Rate (kbit/s)% of original size
Speech2248 or 966 to 7
Classical Music44.164 or 12810
Popular Music48128 or 25620

Advanced Audio Coding (AAC)

This kind of file, originally developed by AT&T, Dolby Laboratories Inc, Fraunhofer IIS and Sony Corporation, is supported by iTunes 4 or later and is used at some websites, including Apple’s iTunes Music Store and the UK’s O2 Music Service. AAC employs smaller files than MP3 and gives better sound quality.

At a rate of 128 kbit/s an AAC recording can sound almost as good as uncompressed audio. This is reflected in the choice of data rates for the High Quality setting in iTunes. In iTunes 3 and earlier versions of the application, which only support MP3, it’s necessary to use 160 kbit/s. However in iTunes 4, where AAC is employed, the default rate has been reduced to 128 kbit/s.

Windows Media Audio (WMA)

This kind of file, also known as a Windows Media 9 (WM9), is affiliated to the Windows operating system and is employed on several non-Apple music websites, including the OD2 service, as provided by EMI in the UK. The content is protected by means of the Digital Rights Management (DRM) system. At the time of writing, this format isn’t supported by iTunes.

  Real Time Audio Streaming (RTAS)

Audio streaming lets you listen to sound material over the Internet in real time, rather than having to download it and play it later. This kind of mechanism is essential for sending radio broadcasts over the Internet. Unfortunately for those of us with a dial-up modem connection, the available bandwidth on the Internet is very limited, requiring the use of advanced streaming software.

QuickTime accommodates both Real-time Transport Protocol (RTP) and Real-Time Streaming Protocol (RTSP), which are used together for RTSP streaming. However, unless you have QuickTime 6, there’s no easy way to handle MPEG-4 files, the ideal streaming format.

Proprietary Streaming Formats

Until MPEG-4 is fully adopted, the many proprietary streaming formats will continue to be used. Shoutcast, for example, is a real time variant of MP3, as supported by recent versions of iTunes (Apple) and Audion (Panic Inc). Icecast, which is a variation of the Ogg Vorbis music format, claims to give better results than MP3, supporting data rates of 64 to 500 kbit/s in stereo and 32 to 256 kbit/s in mono. Unfortunately, although in the public domain, this format isn’t widely supported.

The popular RealMedia format, also known as RealG2, requires RealPlayer (RealNetworks), whilst Dolby AAC, an enhancement of the AAC format supporting rates of 14 kbit/s (mono) to 128 kbit/s (stereo) or higher, offers CD quality when used with a broadband connection.

QuickTime also supports two special codecs, although these have been largely superseded by the formats described above:-

QDesign Music

This software compresses stereo sound, sampled at 44.1 kHz, into a form that’s suitable for streaming. It’s particularly effective on instrumental music, can reduce files to just 3% of their original size and gives good results, even with a 28.8 kbit/s modem.

The data rate can be between 8 and 48 kbit/s, and should be set to one kbit/s for every kHz of the sample frequency, which means that material sampled at 22.05 kHz should use a rate of 24 kbit/s. However, good quality is also possible at 12 or 8 kbit/s, depending on the sound content.

QDesign Music can be very demanding on the recipient’s processor. In fact, it may cause dropped frames and loss of picture synchronisation when used with a high-quality movie.

Qualcomm PureVoice

This system accommodates speech at low bit rates, giving reasonable results even with a 14.4 kbit/s modem. PureVoice is based on Code Division Multiple Access (CDMA), offering 9:1 or 19:1 compression. It’s usually uses a 8 kHz sample rate, although higher rates can be used.

©Ray White 2004.