Audible sound consists of variations in air pressure that cover the frequency range of 20 Hz to 20 kHz. The analogue signal produced by a microphone is converted into digital data by using an analogue to digital converter (ADC), the output of which can be manipulated by your computer. After this, the sound is converted back into analogue form by a digital to analogue converter (DAC), which is usually connected to a pair of headphones or an amplifier.
Some computers have an analogue input and an internal ADC, as well as an analogue output connected to the machine’s DAC. However, some models don’t have an input, requiring you to connect an audio adaptor to a spare USB or FireWire port. You can also such adaptors to provide extra inputs or outputs, which is useful when the built-in sound quality isn’t up to your requirements.
The process of converting analogue audio into digital code, as accomplished by an ADC, involves a technique known as sampling. This ‘chops up’ the signal at a speed known as the sample rate. Generally speaking, this rate should be at least twice that of the highest frequency found in the incoming signal. So, if we required a frequency response of up to 20 kHz we would use a sampling rate of 10 kHz or more. In practice, higher rates are desirable, since it’s possible for inaudible upper frequencies to interact, creating subharmonics that you can hear.
As mentioned above, the choice of sample rate is a compromise between sound quality and the amount memory or disk space that’s available. In some instances, compression can be used to dramatically reduce the space required (see below), although this can degrade the sound quality.
The following table shows Apple’s preferred rates for older Mac OS computers and the expected upper frequency response that they provide:-
|Rate (Hz)||Rate (kHz)||Response (kHz)||Application|
Of these, 11.1, 22.2 and 44.1 kHz are often used. Although most Mac OS computers are actually capable of using rates up to 65,535 Hz (65.5 kHz), this particular frequency is rarely used.
56EE.8BA3in hexadecimal notation, isn’t really twice 11,127.27272 Hz, which is
2B77.45D1in hex. Some applications simply ignore this anomaly and produce sounds at the wrong pitch.
The next table shows the rates used in professional systems and other equipment:-
|Rate (Hz)||Rate (kHz)||Max Response (kHz)||Application|
|11,025.0000||11||5||Multimedia CD-ROM speech|
|22,050.0000||22||10||Multimedia CD-ROM music|
|24,000.0000||24||11||Voice recognition systems|
|32,000.0000||32||15||Broadcast links, domestic digital VCRs|
|44,100.0000||44.1||20||CD, professional systems, DAT|
|48,000.0000||48||22||Professional systems, DAT|
|96,000.0000||96||44||Professional systems, DVD (high-quality)|
In practice, most people accept CD-quality sound, sampled at 44.1 kHz with 16-bit resolution (see below). This requires about 5 MB of hard disk space for every minute of recorded sound, equivalent to 75 KB per second. Of course, for stereo you must multiply this by two and for a multi-track operation you must multiply this figure by the number of tracks.
1:5. These lower rates make the sound deeper and increase the playing time, while higher rates increase the pitch (making human voices sound like chipmunks) and decrease the playing time.
Having sliced up the incoming signal, the ADC converts the measured signal voltage into a digital code. The accuracy or resolution of this code is set by the number of bits used in the process. Once again, a compromise has to be made between quality and the amount of space used by the data.
Although 8-bit samples only occupy half the disk space of 16-bit samples, the sound quality is often horribly granular. However, some multimedia CD-ROMs use such 8-bit material to conserve space. Very high quality systems, such as ProTools 24 (Digidesign), use 24-bit sampling running at 96 kHz or higher. This minimises noise and distortion but is very demanding on disk space.
Each sound sample is represented within a sound frame, often one of the following types:-
|8-bit mono||1||Single byte sample|
|8-bit stereo||2||Byte 1 = Left Byte 2 = Right|
|16-bit mono||2||Two-byte sample|
|16-bit stereo||4||Two bytes for each channel|
Digitised audio material can be compressed to create smaller sound files, so saving disk space. The software used to compress or decompress a file is known as a coder-decoder or codec. Many are built into Apple’s QuickTime package, while others can be provided as separate files.
The following table shows a number of common coding systems, used with or without compression. Except where indicated, these formats are supported by modern versions of QuickTime.
|24-bit Integer||Linear coding, no compression|
|32-bit Integer||Linear coding, no compression|
|32-bit Floating Point||Linear coding, no compression|
|64-bit Floating Point||Linear coding, no compression|
|Adaptive Multi-Rate (AMR) ◊||ACELP-based predictive speech coding for GSM/G3PP phones|
|Advanced Audio Coding (AAC)||16-bit psychoacoustic music coding (see below)|
|ALaw 2:1||2:1 lossy compression|
|Code Excited Linear Predictive (CELP)||Predictive speech coding|
|IMA 4:1 *||4:1 lossy compression, audible effects|
|MACE 3:1 •||3:1 lossy compression, audible effects|
|MACE 6:1 •||6:1 lossy compression, highly audible effects|
|MetaSound AC8 †||8-bit acoustic coding|
|MetaSound AC11 †||11-bit acoustic coding|
|MetaSound AC16 †||16-bit acoustic coding|
|MetaSound AC24 †||24-bit acoustic coding|
|MetaVoice RT24 †||Speech coding|
|MP3||16-bit psychoacoustic music coding (see below)|
|MS ADPCM +||Microsoft standard|
|QDesign Music 2 ‡||Real-time music coding|
|Qualcomm Code Excited Linear (QCEL)||Predictive speech coding|
|Qualcomm PureVoice||9:1 or 19:1 real-time speech coding|
|VivoActive G273 †||Special|
|VivoActive SIREN †||Special|
|µLaw 2:1||2:1 lossy compression|
◊ Eight bit rates can be used, from 4.75 to 12.2 kbit/s
* Interactive Multimedia Association, 16-bit data only
• Macintosh Apple Compression and Expansion, 8-bit or 16-bit data
+ Coding not available via QuickTime
† Coding via QuickTime may be possible using extra codec file
‡ Bit rate of 8, 10, 12, 16, 20, 24, 32, 40 or 48 kbit/s
Not all file formats can support every kind of codec. For example, an AVI file can only use ALaw or µLaw coding, a Mac System 7 sound file only uses ALaw, IMA, MACE or µLaw, a Wave file only uses ADPCM and a µLaw file only accommodates Floating Point, ALaw or µLaw coding. However, any codec can be used inside an AIFF sound file or QuickTime movie file.
Earlier forms of compression, such as IMA or MACE, employ lossy algorithms that remove some of the audio data, resulting in audible distortion and other effects. They’re usually best avoided.
Later codecs, such as AAC, CELP (both part of the MPEG-4 (MP4) standard), MP3, QDesign Music 2 and the Qualcomm codecs, use a lossy form of coding known as psychoacoustic coding or perceptual coding. This exploits the fact that loud sounds at one frequency mask quieter tones at other frequencies. MP3 and AAC are used for music on the Internet.
Psychoacoustic coding, as described above, has been developed as part of several different file formats, the most common of which are MPEG I Layer-3 (MP3), Advanced Audio Coding (AAC) and Windows Media Audio (WMA). All accommodate a quality that approaches that provided by an audio CD, but with a file size suitable for downloading over the Internet.
This format, devised by the Motion Pictures Experts Group (MPEG),the International Standards Organisation (ISO) and the International Electro-technical Commission (IEC), gives better compression than the older MPEG I Layer-1 or Layer-2 systems, reducing audio data to an eleventh of its original size whilst conveying mono or stereo sound sampled at 32, 44.1 or 48 kHz with 16-bit resolution. Although lossy, the perceptual coding provides high compression with only a small loss of quality, whilst Huffman coding is used to further compress the data.
MP3 files can be played on a computer using a MP3 application such as Apple’s iTunes or can be downloaded via a FireWire or USB port to a portable MP3 player, such as an iPod.
The amount of data used is set by the chosen data rate. This can be as high as 128 kbit/s for mono or 384 kbit/s for stereo, although most stereo MP3s are usually encoded at 128 or 96 kbit/s.
The following table shows the reductions in file size obtained by using different data rates:-
|Content||Sample Rate (kHz)||Data Rate (kbit/s)||% of original size|
|Speech||22||48 or 96||6 to 7|
|Classical Music||44.1||64 or 128||10|
|Popular Music||48||128 or 256||20|
This kind of file, originally developed by AT&T, Dolby Laboratories Inc, Fraunhofer IIS and Sony Corporation, is supported by iTunes 4 or later and is used at some websites, including Apple’s iTunes Music Store and the UK’s O2 Music Service. AAC employs smaller files than MP3 and gives better sound quality.
At a rate of 128 kbit/s an AAC recording can sound almost as good as uncompressed audio. This is reflected in the choice of data rates for the High Quality setting in iTunes. In iTunes 3 and earlier versions of the application, which only support MP3, it’s necessary to use 160 kbit/s. However in iTunes 4, where AAC is employed, the default rate has been reduced to 128 kbit/s.
This kind of file, also known as a Windows Media 9 (WM9), is affiliated to the Windows operating system and is employed on several non-Apple music websites, including the OD2 service, as provided by EMI in the UK. The content is protected by means of the Digital Rights Management (DRM) system. At the time of writing, this format isn’t supported by iTunes.
Audio streaming lets you listen to sound material over the Internet in real time, rather than having to download it and play it later. This kind of mechanism is essential for sending radio broadcasts over the Internet. Unfortunately for those of us with a dial-up modem connection, the available bandwidth on the Internet is very limited, requiring the use of advanced streaming software.
QuickTime accommodates both Real-time Transport Protocol (RTP) and Real-Time Streaming Protocol (RTSP), which are used together for RTSP streaming. However, unless you have QuickTime 6, there’s no easy way to handle MPEG-4 files, the ideal streaming format.
Until MPEG-4 is fully adopted, the many proprietary streaming formats will continue to be used. Shoutcast, for example, is a real time variant of MP3, as supported by recent versions of iTunes (Apple) and Audion (Panic Inc). Icecast, which is a variation of the Ogg Vorbis music format, claims to give better results than MP3, supporting data rates of 64 to 500 kbit/s in stereo and 32 to 256 kbit/s in mono. Unfortunately, although in the public domain, this format isn’t widely supported.
The popular RealMedia format, also known as RealG2, requires RealPlayer (RealNetworks), whilst Dolby AAC, an enhancement of the AAC format supporting rates of 14 kbit/s (mono) to 128 kbit/s (stereo) or higher, offers CD quality when used with a broadband connection.
QuickTime also supports two special codecs, although these have been largely superseded by the formats described above:-
This software compresses stereo sound, sampled at 44.1 kHz, into a form that’s suitable for streaming. It’s particularly effective on instrumental music, can reduce files to just 3% of their original size and gives good results, even with a 28.8 kbit/s modem.
The data rate can be between 8 and 48 kbit/s, and should be set to one kbit/s for every kHz of the sample frequency, which means that material sampled at 22.05 kHz should use a rate of 24 kbit/s. However, good quality is also possible at 12 or 8 kbit/s, depending on the sound content.
QDesign Music can be very demanding on the recipient’s processor. In fact, it may cause dropped frames and loss of picture synchronisation when used with a high-quality movie.
This system accommodates speech at low bit rates, giving reasonable results even with a 14.4 kbit/s modem. PureVoice is based on Code Division Multiple Access (CDMA), offering 9:1 or 19:1 compression. It’s usually uses a 8 kHz sample rate, although higher rates can be used.
©Ray White 2004.