Basics of Sound Recording

An analogue audio signal is conveyed as a varying electrical voltage whose magnitude changes in proportion to the intensity of the original sound pressure. Ideally, the range of voltages that can be accommodated should match the dynamic range of real sound. In practice, distortion often occurs at high levels and background noise or hiss appears at low voltages.

The dynamic range accommodated by a recording or transmission system is measured in decibels (dB). It’s usually limited by the medium in use, as illustrated in the following table:-

MediumDynamic Range (dB)
Radio using Amplitude Modulation (AM) •26
Compact Cassette without Noise Reduction50
Professional Reel-to-Reel Tape Recorder70
16-bit CD or Digital Audio Tape (DAT)90
Professional 20-bit Recording System120
Set by the 100% and 5% modulation levels at the transmitter

Unfortunately, the range of sounds created by a symphony orchestra extends over 70 dB, whilst those in the real world cover more than 120 dB. This means that material destined for less adequate systems must be carefully controlled, or processed using an audio compressor or limiter.

Microphones

Any sound from the real world must first be converted into an electrical signal by means of a microphone. All microphones contain some form of diaphragm that moves under the influence of the sound pressure waves. This is either part of a capacitor, as in a capacitor or condenser microphone, or is mechanically linked to a coil, as in a moving-coil microphone.

A capacitor microphone usually has a built-in amplifier, requiring a battery inside the microphone or an external source of power. In a professional studio, the mixing desk provides phantom power along the microphone’s connecting cable.

A moving-coil microphone, also known as a dynamic microphone, doesn’t need power. In fact, phantom power can upset its operation. Such devices are very robust and can handle almost any sound, although the high-frequency performance is often inferior to that of a capacitor microphone.

Characteristics

Professional microphones usually have one of the following characteristics or polar patterns:-

Omnidirectional

This gives an all-round pickup area. Clip-on, tie-clip or Lavalier microphones are usually of this type. They can be tricky to use, especially with a public address (PA) system.

Figure-of-Eight

Now rarely encountered, this type of capacitor microphone gives the same response at both front and back, making it ideal for an ‘across the table’ discussion.

Cardiod

Provides a heart-shaped response, with reduced pickup to the rear and sides. This kind of microphone, often of the dynamic variety, is ideal for hand-held work.

Hypercardiod

Similar to cardiod but more directional: also known as supercardiod

Unidirectional

An extreme hypercardiod, often in the form of a shotgun microphone.

Hemispherical

An unusual characteristic, only found in a PZM (see above).

Connections

The connection to a high-quality microphone is made over a balanced circuit, usually via a 3-pole XLR or 3-pole DIN (Tuchel screw-type) connector. The impedance is usually 200 ohms (Ω), making it suitable for a typical mixing desk whose microphone input has an impedance of around 1200 Ω.

Output Levels

A microphone’s output level is often given in decibels (dB) relative to one volt per pascal (1 V/Pa), where a pressure of one pascal (Pa) is equal to a force of one newton per square metre (N/m2). In some instances, the figure’s given in millivolts per pascal (mV/Pa). The following table gives conversions between these two methods of measurement:-

dB (1V/Pa)mv/Pa
-2095
-2555
-3030
-3518
-4010
-455.5
-503.0
-551.8
-601.0
-650.55

You may also encounter microphones whose level is given in volts per dyne per square centimetre (V/dyne/cm2). Fortunately, one V/dyne/cm2 equals 10 V/Pa, making conversion very simple. Similarly, a level given in dB relative to 1 V/dyne/cm2 is 20 dB lower than a figure related to 1 V/Pa.

Example Microphones

Details of the following ‘classic’ microphones are given for reference:-

MakeModelTypeResponsePlugOutput: dB (1V/Pa)
AKGD202DynamicCardiodXLR-54 (C)
AKGC451CapacitorCardiod *XLR (P)-39
BeyerM160DynamicHypercardiodDIN-63
BeyerM201DynamicHypercardiodXLR/DIN-60
SonyECM50Electret +OmniXLR-53
NeumannU87iCapacitorSwitchable •XLR (B)-38 (A)(C)

* Cardiod with CK1 capsule, or omni with CK2 capsule

+ Very small microphone with thin cable to combined battery and amplifier unit

 Cardiod, figure-of-eight or omni

(P) Requires phantom power

(B) Requires phantom power or two 22.5 V batteries

(C) Incorporates bass-cut switch

(A) Has built-in 10 dB attenuation switch

Loudspeakers

All recorded sound needs to be checked on a loudspeaker, or a pair for stereo. For best results these should be between 2 and 2.5 metres apart and a similar distance from the listener. In addition, the listener should be more than 1.8 metres away from any wall. Finally, there should be at least half a metre between the loudspeakers and the front of the mixing desk.

Some form of acoustic treatment, or at least heavy curtaining, should be provided behind and to the sides of the loudspeakers. The speakers are best placed in a corner since the effective power of a loudspeaker is doubled when placed between wall and floor, and doubled again when placed in a corner position.

If you have problems with acoustics that can’t be fixed by changes to the room you can resort to loudspeaker equalisation, usually provided by a multi-band graphic equaliser. This is best inserted in the monitoring circuit, prior to the loudspeaker volume control on the mixing desk. Failing this, it can be wired directly in the audio circuits to the loudspeaker amplifiers.

The human ear detects the position of a stereo image by sensing both differences in level (for frequencies above 700 Hz) and differences in phase (for frequencies under 500 Hz). This means that you must initially set up your loudspeakers for correct levels and phase (see below).

Speaker Levels

The relative levels for a pair of loudspeakers is normally adjusted using the loudspeaker balance control on the mixing disk. If your loudspeaker amplifiers have their own volume controls these should be set so as to give a central stereo image. You should also end up with a reasonable range of adjustment at the loudspeaker volume control on the mixing desk.

Phase

The phase of the loudspeakers can be checked by applying identical material to both channels and then moving the speakers in front of each other. If there’s a noticeable drop in bass response, your speakers are out of phase. This kind of problem is usually caused by a wiring error in the cable between the amplifier and loudspeaker. Other possibilities include wiring faults inside the speaker cabinet, in a loudspeaker amplifier or loudspeaker equaliser, in a cable from the mixing desk to a amplifier or even inside the desk itself. Diagnosis is best accomplished by a process of substitution.

Mixers and Signal Levels

Most analogue audio devices only accept signals within a given range. If the applied signal is too low you must turn up the volume or gain control, possibly introducing background noise or hiss. On the other hand, a very high level signal can cause distortion, whatever the control setting. The following table indicates the expected peak levels from different sources:-

SourceTyp (dB)Min (dB)Max (dB)
Microphone-40-700
Synthesiser0-200
Domestic Recorder0-100
Professional Recorder+80+12

Most types of analogue audio mixer or mixing desk have a microphone input and a line input on each channel. Although the level of each sound source can be adjusted by means of a channel fader, the ‘default’ gain of a channel is set by additional controls, usually of the rotary type.

In a radio broadcasting studio each fader is normally fully ‘open’ or ‘shut’, so the rotary controls are set to provide the correct output level when the fader is fully ‘open’. In a recording studio however, the channel faders are usually ‘set back’ from their end stops, allowing the operator to ‘ride’ them during a recording session. To make this possible, the rotary controls are set to give an ‘extra’ amount of gain (usually around 10 dB), which is known as gain in-hand. Nearly all mixers provide further gain in-hand (of between 10 and 20 dB) on the main output fader.

In most instances, you’ll need to tailor the dynamic range of your recordings to match your chosen medium. This usually requires a compressor or limiter that’s wired to the output of your mixer.

Finally, any professional mixer should be accompanied by some kind of sound processing device. The most useful treatment is of course, reverberation, sometimes known as echo.

Microphone Inputs

The microphone inputs on a mixer usually have their own microphone input sensitivity controls as well as a microphone pad switch. You’ll need the latter when using a microphone in close proximity to high-level sounds (such as a drum kit) or when connecting other sound sources.

Modern microphones often require feed of phantom power from the mixing desk. This can normally be enabled by means of a switch on the appropriate channel. However, it’s a good idea to switch off this power when using a microphone or other device that doesn’t need it. At best, an unwanted phantom supply can increase background noise or interference At worst, it can harm an alternative sound source, such as a synthesiser or domestic recorder, that’s connected to an input.

As indicated above, a microphone input can be used for a device whose output is insufficient for a line input. However, the input impedance, usually around 1200 Ω, is often too low, causing distortion or a peculiar frequency response. Professional devices usually work with such an impedance, but the signal levels are often too high, again causing distortion.

Line Inputs

Most devices, such as a professional tape recorder, should be plugged into a line input (not a microphone input) and the line input sensitivity control should be set to around +4.

Other equipment, such as a synthesiser or domestic recording machine, can be difficult to connect to a mixer. Initially, you should plug the source into a line input, since this has a higher input impedance, but if the signal appears at low-level or is noisy you should try using a microphone input. Failing this, you’ll need an external amplifier that’s wired to a line input.

Line up and Monitoring

A line-up procedure is used in a recording studio or radio broadcasting studio to ensure consistent levels throughout the system. Initially, a line-up signal is used to check all the recording devices. Then, during recording or transmission, the signals are constantly monitored for quality.

Meters

Signal levels during line-up and monitoring are usually measured on a peak programme meter (PPM), a volume unit (VU) meter, or a bargraph meter that emulates a PPM or VU meter.

A PPM has a linear scale numbered from 1 to 8. Each division represents 4 dB, although in very old meters the space between PPM 1 and PPM 2 actually represents 6 dB.

The signal level corresponding to each mark is given in the following table:-

PPMLevel (dB)
1-12
2-8
3-4
40
5+4
6+8
7+12

As you can see, it can show a dynamic range of 24 dB, matching the range provided by radio broadcasting when using amplitude modulation. Unfortunately, this also makes it less suitable for monitoring material with a greater range. If you do need to see the full dynamic range you may prefer to use a peak-reading form of bargraph display.

A standard PPM has a rapid rise and slow decay time, the latter causing the needle to fall from PPM 7 to PPM 1 in around three seconds. This makes it easy to see high-level sounds of short duration but without it becoming painful to the eyes. Although excellent for keeping a strict control on levels, the PPM is a rather ‘conservative’ device. Much higher levels can be put onto analogue recording tape by using the combination of a VU meter and an experienced recording engineer.

A VU meter is far more rudimentary, often consisting of nothing more than a simple moving-coil meter and a diode. It tends to smooth out signals, ignoring short bursts of high-level sound. This means that it’s very easy to get a distorted recording. The point on the scale marked as 100% or 0 VU corresponds to a signal level of 1.228 V, equal to 71% of full scale deflection.

Line-up Signal

A line-up signal is normally produced by a test tone generator, either built into a mixing desk or in the form of a portable device. This usually produces a sine wave output at a frequency of 1 kHz. The most common line-up level of 775 millivolts (mV) usually corresponds to the following:-

MeasureLevel
PPM4 (0 dB)
VU Meter-4 dB
Bargraph •-20 dB
Tape Flux *250 nWb/m

 IEC standard line-up for 16-bit programme interchange material on DAT

* For quarter-inch reel-to-reel tape machine

nWb/m = nanowebers per metre

Mono and Stereo Compatibility

Care must be taken to ensure that those with mono equipment can hear everything in your recordings. It’s very easy to create wonderful stereo, containing out-of-phase components, which can’t be heard in mono. For this reason, most mixing desks have a mono PPM as well as a stereo PPM. The needle colours used in various types of PPM are as follows:-

SignalColour
Left (A)Red
Right (B)Green
Mono or Sum (A+B)White
A+B with announcementsBlue
Side or Difference (A-B)Yellow

A mono output can be obtained from a stereo mixing desk by simply combining the left and right hand channels via a buffer amplifier and a 6 dB attenuator or pad.

To maintain a constant volume at a mono output whilst a sound source moves across the stereo sound stage, some broadcasting studios replace the 6 dB attenuator in the mono circuit by a 3 dB version, also inserting a similar attenuator into the feed to the mono PPM. In some instances, a special stereo line-up is also used, usually using a reference of -3 dB per channel, giving 0 dB at the attenuated mono output. Where a reel-to-reel tape machine with a wide guard track is used, the playback line-up from such a machine can be modified to give -4 dB per stereo channel, resulting in -1 dB at a the mono output. In the author’s opinion this is all horribly complicated and unnecessary.

The readings for stereo and mono PPMs, using a 3 dB pad, are given below. A pair of identical signals, also known as coherent signals, can be obtained by connecting a test tone generator or other suitable source to both inputs. For non-identical signals, also known as incoherent signals, you can connect a separate white noise generator to each input.

Signal TypeLeft (A)Right (B)Mono (A+B)
Identical00+3
Identical-3-30
Identical+5+5+8
Non-identical+8+8+8 •
Non-identical+11nil+8
Non-identicalnil+11+8
Can vary from +5 dB to +11 dB, with average figure of +8 dB

Monitoring

The peak level of a recording is usually 8 to 12 dB above line-up level, corresponding to PPM 6 or 7 on each channel. Depending on the arrangements for mono outputs and monitoring there may be a similar level on the mono meter, although in some instances it’s 3 dB higher. If the mono output is too high you may need to back-off the main fader on the mixer.

A level of PPM 7 on each channel should equate to -8 dB on a bargraph meter, but can go off the end of a VU meter. If you have a bargraph PPM the display may brighten up above PPM 6 to indicate that you’re close to the maximum signal level.

The following diagram summarises the typical relationship between signal levels and metering:-

Stereo Reproduction

Modern sound systems commonly use two-channel or stereo reproduction, accommodating the fact that human beings receive sounds via both ears. By using two microphones and two loudspeakers, the sound field created by the original sources of sound is replicated, complete with variations in level and phase. Sounds in such a field are often located by small movements of the listener’s head. The traditional positioning of a pair of loudspeakers is shown below.

A pair of large stereo loudspeakers can be expensive and usually occupy a large amount of space in the room. Fortunately, the human ear can’t locate low-frequency sounds, making it possible to replace such speakers by smaller units, together with a single low frequency effects (LFE) speaker, also known as a woofer or sub-woofer loudspeaker, as shown here.

In this arrangement, the main speakers reproduce the high frequencies containing directional information, whilst the LFE speaker accommodates the bass frequencies. Although shown centrally-positioned, the latter can be placed anywhere in the room to equal effect. This configuration is sometimes known as 2.1, since the main speakers cover almost the entire audio bandwidth, whilst the woofer covers less than 110 of the range, usually extending to only 100 Hz. It’s worth noting that this arrangement, although convenient, also requires a woofer amplifier, channel-combining circuitry and filters.

Stereo systems sometimes suffer from the ‘hole in the middle effect’, in which the sounds appear to come from each loudspeaker but not from the centre of the ‘sound stage’. This can be corrected by using an extra centre loudspeaker, which is fed with a mix of the left and right hand signals, as illustrated below.

This 3.1 configuration requires extra amplification, although the LFE speaker can be omitted to reduce costs.

Surround Sound

Stereo doesn’t allow for the fact that real sounds have components emanating from various directions. In fact, the only stereo system that works properly is binaural recording. This uses a dummy head recording technique, resulting in dramatic effects when listening on headphones. Unfortunately, it produces a narrow stereo image with loudspeakers.

The limitations of stereo has led to the development of multi-channel systems that create an effect known as surround sound. Such technology can also enhance normal stereo material, either to create a pleasing effect or to replicate the original sound’s acoustic environment or ambience.

Early Systems

The first attempts at surround sound were four-channel systems, giving an effect known as quadraphonic sound and requiring four separate amplifiers and loudspeakers, as shown below. Fortunately, the LR and RR amplifier and speakers don’t have to be as powerful as those for LF and RF, since most of the sound power is produced at the front.

Systems developed with this arrangement during the seventies and eighties have fallen by the wayside, partly due to a lack of demand and also because of shortcomings in the technology of the time. Most employed a matrix encoder to combine four discrete channels into a form of stereo, conveying the directional information as phase relationships, whilst retaining some compatibility with existing equipment. A complementary decoder was used to extract the quadraphonic sounds.

The following systems were used:-

Ambisonic UHJ

This quadraphonic format was derived from Sansui’s original QS system (see below). The BBC experimented with this system on FM radio, although it was never popular, possibly because of the extra cost. Unfortunately, UHJ encoded material isn’t ideal on stereo equipment, especially with headphones, where the ‘out-of-phase’ component can make listening very uncomfortable. Oddly enough, UHJ decoders are excellent for adding ambience to standard stereo recordings.

Hafler Matrix

This system extracts the ambience information, the ‘out-of-phase’ element in a normal stereo signal, by using a third loudspeaker or additional pair of loudspeakers to the rear. These speakers are wired in series and then connected across the two ‘positive’ terminals on the loudspeaker outputs of the stereo amplifier. This is very cheap to implement and is effective with both stereo and quadraphonic material.

QS

A matrix system developed by Sansui, giving passable compatibility with mono and stereo.

SQ

Developed by Sony, this matrix system was impressive but compatibility was very poor.

Other Systems

Matrix systems were never very successful, mainly because of the effect they had on normal stereo reproduction. Other systems, such as CD-4 and UD-4, employed carrier-based technology to add discrete channels to a standard LP record. These gave good results and were fully compatible with normal stereo equipment. Unfortunately, they were also expensive.

Modern Home Cinema Systems

Virtually all home cinema systems incorporate some form of surround sound that’s based on a Left, Centre, Right, Surround (LCRS) speaker configuration. The most common variation, known as 5.1 since it uses five full-bandwidth speakers and a single LFE loudspeaker, is shown below.

The front three loudspeakers are known as the screen speakers, whilst those to the side are the surround speakers. The centre loudspeaker overcomes the ‘hole in the middle’ effect that’s particularly prevalent in cinema environments.

Other configurations are also encountered in special auditoriums and in more expensive systems. The 6.1 arrangement, shown below, uses six full-range speakers, adding an extra BS speaker, centrally placed to the rear of the audience.

The less common 7.1 configuration, shown below, has two extra speakers at the rear, identified as LB and RB.

Lesser systems may also be used, such as the 4.1 arrangement, which is used on some types of SoundBlaster Live! sound card, as commonly found in computers of the PC variety.

Dolby Surround and Dolby Digital

The following systems, developed by Dolby Laboratories Inc, are in common use:-

Dolby Surround

This system is a matrix system, similar in some respects to the older technology described above. However, unlike the earlier attempts, a Dolby Surround decoder produces signals in LCRS form. The directional loudspeakers originally used in cinemas had a frequency response that extended from 100 Hz to 7 kHz, the latter being in line with usual cinema tradition. The LFE speaker, however, is fed with signals below 100 Hz, as derived from a filtered mono version of the stereo signal.

Modern home theatre systems incorporate Dolby Surround Pro Logic or Dolby Surround Pro Logic II decoders, which give greatly improved results from existing Dolby Surround material. Although recordings can be optimised for Logic II, they’re always compatible with older decoders. The more recent Dolby Surround Pro Logic IIx variety of decoder can also process normal stereo or 5.1 material so as to create the signals necessary for a 6.1 or 7.1 system.

Dolby Digital

This system, first introduced to 35 mm film in 1992, is now the standard multi-channel format for DVD-Video, Digital Video Broadcasting (DVB), Digital Television (DTV) and other cable or satellite-based television systems. Dolby Digital employs Dolby’s own AC-3 audio coding to convey five full-bandwidth channels plus a sixth channel for the LFE speaker.

All DVD-Video players that have a built-in Dolby Digital decoder come with six audio outputs that can be directly connected to a suitable home cinema amplifier. Players that lack such a decoder usually have an analogue stereo output as well as an S/PDIF connector. The latter conveys AC-3 data and can be connected to a home cinema decoder-amplifier. Such decoders are often in the form of a receiver that can also be used to pick up Dolby-encoded radio material.

Digital Theater Systems (DTS)

Although Dolby Digital is the standard system for DVD-Video, other audio formats, including those developed by Digital Theater Systems Inc, are also permitted. However, all DVDs must have a Dolby Digital or a PCM track as well, ensuring that any disc can be played on any hardware. Unfortunately, the audio data on a ‘dual standard’ disc uses more space.

The following variations of DTS technology are available:-

DTS Digital Surround

Similar in many ways to Dolby Digital, providing a 5.1 signal, but using proprietary DTS coding. Dolby Laboratories and Digital Theater Systems continue to argue over which gives the best results, although the author suspects that there’s little difference and that most listeners don’t notice.

DTS Extended Surround (DTS-ES)

This 6.1 and 7.1 variation is fully compatible with the 5.1 version. The 6.1 variety accommodates an extra Centre Surround (CS) loudspeaker, which is normally positioned midway between the LS and RS speakers. Domestic products don’t usually provide a suitable connection, so the information assigned to the CS output is ‘matrixed’ into the LS and RS decoder outputs. However, professional systems usually have extra outputs for the additional loudspeakers.

DTS 96/24

A multi-channel format, also compatible with the 5.1 version, which is designed for DVD-Video, employing a sample rate of 96 kHz and using 24-bit samples, hence 96/24.

Virtual Surround Sound

This special technique involves the use of ‘phasey’ sound to give the illusion of surround sound, even though only two loudspeakers are used. Digital processing is often used.

References

Dolby Laboratories website at www.dolby.com

Digital Theater Systems website at www.dtstech.com

©Ray White 2004.