In digital audio a signal is represented by electrical messages in the form of numbers. Generally, such signals are more robust than the varying voltages used in analogue audio and are less prone to electrical interference. The most common recording media for digital audio are hard disk (HD), Compact Disc (CD), MiniDisc (MD) and Digital Audio Tape (DAT).
Digital audio is also used in digital audio broadcasting (DAB), where digital audio is converted into MPEG 2 format, ‘uplinked’ to a satellite and then ‘downlinked’ back to numerous terrestrial transmitters, all of which are synchronised. Unlike conventional broadcasting, where separate frequencies are assigned to specific areas, DAB uses common frequencies across the entire country, allowing a mobile listener to receive uninterrupted programmes without retuning. Unfortunately, the MPEG 2 conversion and satellite links introduce a delay in transmission of around one to two seconds.
Digital audio signals are usually created by feeding an analogue signal into an analogue to digital convertor (ADC). Typically, this gives 16-bit or 24-bit digital code, sometimes in the form of pulse code modulation (PCM) where each bit represents a linear ‘step’ in level. A small amount of white noise, also known as dither, is often added prior to conversion. This reduces quantisation noise, an effect caused by bits stepping in and out at low levels of volume.
Digital data can be passed through a digital signal processing (DSP) device, giving control over levels, equalisation, delay, reverberation or other effects. All of this involves real-time number-crunching and very fast processing. Of course, such arithmetic processes generate noise, which means that 32-bit processing should be used with 20-bit or 24-bit audio data.
Eventually, the result is sent to a DAC (digital to analogue convertor). This often uses over-sampling, which inserts additional samples into the data, obviating the need for a brick wall filter and any consequent distortion. In a 16 times over-sampling convertor a single bit of data increments or decrements the analogue output at 16 times the sampling rate.
Some types of ADC employ pre-emphasis, also simply known as emphasis. This technique significantly reduces the subjective noise of a 16-bit audio system by boosting high frequency (HF) signals prior to conversion, whilst a complementary HF cut, known as de-emphasis, is used in the DAC. Although emphasis is useful for 16-bit multi-track equipment, it also causes serious problems when connecting 16-bit digital signals to 20-bit or 24-bit devices. In addition, the processing can slightly change the sound quality or introduce breathing effects on certain material.
If emphasis is chosen, usually with the emphasis switch on an ADC, it can be difficult to reverse, since emphasis can’t always be removed from the digital data. The signal should contain an emphasis flag, activating de-emphasis at the DAC when required, although this doesn’t work in some DACs. Fortunately, it always operates correctly in a CD or DAT player.
The duration of any digital recording is limited by the available digital storage capacity. The maximum recording time accommodated by such a device is given by:-
which can be simplified into:-
t = recording time in minutes (min)
d = storage space in megabytes (MB)
f = sampling frequency in kilohertz (kHz)
b = number of bits per sample
T = number of audio tracks
This equation can be rewritten to give the required memory as:-
The table below shows how much space is required for every minute of a single-channel recording. It also shows the period of time you can expect to record for every 10 MB of storage capacity:-
|f (kHz)||Bits/sample||MB/min||Minutes for 10 MB|
As shown above, digitised audio requires considerable storage space, which can only be reduced by using compression, also known as bit rate reduction. Those forms of compression that don’t lose any of the original information are known as lossless whilst other systems are lossy.
Unfortunately, older forms of lossless compression have little effect on storage requirements. Lossy systems must then be used, involving a compromise between sound quality and the amount of data required. Computer-based sounds are often processed using simple systems such ALaw (2:1), IMA (4:1), MACE (3:1 or 6:1) and µLaw (2:1). Although the sound quality is usually seriously degraded, these systems don’t require any extra hardware or software at the receiving end.
Some equipment uses a hardware compression system, such as Musicam, Aspec, APi-X64 (12:1 compression), Dolby AC-1 (for older satellite systems), Dolby AC-2 (4:1 or 6:1 compression at rates of 128 or 192 kbit/s) or Dolby AC-3 (used on DVD-Video discs). Such technology lets you record, say, a minute of audio sampled at 32 kHz onto a standard floppy disk.
Other sophisticated software systems include MP3 and Advanced Audio Coding (AAC), both used for music on the Internet. These technologies, along with AC-2 and AC-3, apply a psychoacoustic approach to bit rate reduction, relying on the ear’s inability to detect low-level sounds in the presence of other loud material, an effect known as masking. A Fourier analysis of the audio spectrum is used to modify the data, often resulting in a performance similar to that of 18-bit linear PCM. A complementary process is used during playback, prior to conversion to analogue form.
A sound file converted to MP3 and conveyed at a bit rate of 128 kbit/s is usually 1⁄10 the size of the original file. However, as many people have noticed, there’s a noticeable effect on sound quality. Fortunately, the later AAC format doesn’t suffer so badly from this problem, although it still isn’t perfect.
The systems discussed above don’t provide enough compression for real-time audio streaming, as used for radio on the Internet. This demands more advanced technology, such as Dolby AAC, QDesign Music 2 or Qualcomm PureVoice.
Dolby AAC, which is an enhancement of AAC, supports data rates of 14 kbit/s (mono) to 128 kbit/s (stereo), or higher, offering CD quality when used with a broadband connection. QDesign Music 2, which reduces data to 3% of its original size, employs rates of 8, 10, 12, 16, 20, 24, 32, 40 or 48 kbit/s, whilst Qualcomm PureVoice, which uses Code Division Multiple Access (CDMA) technology for speech content, reduces the amount of data by 9:1 or 19:1.
Error correction is used in virtually all digital recording equipment. This process reduces data corruption by adding redundant bits that have a mathematical relationship with the audio data itself. At the receiving end these relationships are detected and corrections made where necessary.
CD-Audio uses one of the most advanced error correction systems available. This is necessary since a hole of just 0.1 mm diameter can cause up to 20 samples to be lost. During writing, data is taken from several samples and scrambled on two levels, a process known as Solomon Reed Interleave. This means that small correctable errors might occur over several samples, rather than having lesser samples damaged beyond repair. If a sample is lost completely an estimated value, based on the preceding and subsequent data, is put in its place; a technique known as interpolation.
The sampling frequency is the rate at which audio is chopped-up into digital form. Changing this within existing digital data is tricky, although a sample rate convertor (SRC) will insert or delete samples, sometimes resulting in distortion. Using analogue connections instead of digital circuits is often the cheapest and most practical solution to any such difficulty.
Standard sample rates include:-
Used for high-quality sound-editing and multi-track systems, providing a theoretical frequency response of up 192 kHz. Although few people can hear above 20 kHz, this range preserves those audible sounds that are produced by the interaction of inaudible higher frequencies.
Also used in professional systems, such as ProTools, this gives a theoretical frequency response of up 96 kHz, again providing a high quality of sound reproduction.
Used for audio tracks on Digital Versatile Disc (DVD) when used for DVD-Video. The frequency response extends almost to 48 kHz, again well beyond the range of human hearing.
Often used by broadcasting organisations, and easily converted to 32 kHz for sending via digital links to radio transmitters. In theory, this gives a frequency response up to 22 kHz, but in reality the limit is often 20 kHz, especially in equipment that also operates at 44.1 kHz.
Originally used during the development of CD, allowing digital audio to be recorded on European PAL video recorders, and now the ‘de facto’ standard for CD. Sadly, this rate makes the digital output of a CD player incompatible with 48 kHz equipment, necessitating a sample rate convertor. Material created on Digital Audio Tape (DAT) for eventual transfer to CD should be recorded using this rate. The frequency response extends to 20 kHz.
Again used during the development of CD, enabling digital audio to be recorded on American NTSC video recorders. Now rarely used but generally compatible with 44.1 kHz equipment, although there’s a small change in pitch when connected to such devices.
Used for broadcasting links to frequency modulation (FM) radio transmitters and for an extended recording time on some DAT recorders. Fortunately, tapes recorded at this rate are playable on any machine, even if it doesn’t record at 32 kHz. The frequency response extends to 15 kHz, which isn’t adequate for high-quality reproduction.
Sometimes used for low-quality audio with computer games, at World Wide Web sites or in multimedia presentations. In 8-bit form, it uses a quarter of the data that would be needed for CD-quality sound. The frequency response reaches 10 kHz, giving a rather muffled quality.
Used instead of 22.05 kHz when there are serious limitations on data space or data rate. The sound quality is usually abysmal, although adequate for speech. The frequency response extends to around 5 kHz.
This indicates the number of bits used in a digital signal, determining both the quality and dynamic range (the range between quiet and loud sounds) of a recording.
Standard resolutions include:-
Used in older computers and ‘budget’ digital devices. Although adequate for basic speech purposes, this can’t accommodate the dynamic range of music or other real sounds.
Used for CD and other high-quality consumer products, giving adequate results for most users and a dynamic range matching the human ear. This means the background noise in the real world often exceeds that on a recording. And if such noise could be heard, the maximum recorded volume would exceed the capabilities of most audio systems.
Used for professional audio recording. Unfortunately when mixing sounds, the component sounds can’t be ‘fitted’ exactly within the range provided by normal 16-bit sampling. For example, in a multi-track system each track is recorded at optimum level and the volume adjusted during ‘mix down’. Inevitably, such level changes and applied equalisation introduce more noise or a loss of headroom. Increasing the number of bits from 16 to 20 expands the available dynamic range by 20 decibels (dB), giving an extra 20 dB ‘margin’ for adjustments in levels.
It’s possible to create a very high-quality recording with 16-bit equipment, but only if you control the levels of your source material very carefully.
A wide range of digital recording formats are used, some of which are discussed below:-
Compact Discs come in CD-Audio (CD-A), CD-ROM, CD-Recordable (CD-R) and CD-ReWritable (CD-RW) versions. Both CD-A and CD-ROM discs are manufactured by ‘pressing’, although the actual data structure is different. CD-A discs conform to the Philips Red Book standard, containing a table of contents (TOC) at the beginning (centre) of the disc. This TOC contains PQ coding that provides information about each audio track and its duration.
Although the contents of a CD-R normally conform to the Philips Orange Book standard, it’s also possible to create an audio disc conforming to the Red Book standard on a CD-R or CD-RW disc.
There is, however, a slight complication in that CD-R drives can usually record part of a disk, in what’s known as a session. Additional sessions can then be used to fill up the entire disc, during which time a temporary table of contents is created in an area just before the normal TOC. At this stage the disc can only be played on a CD-R drive or on an Orange Book-compatible CD player. Unfortunately, older audio CD drives can’t accept these discs since they don’t have a normal TOC.
Only when all recording is completed can the disc be ‘fixed’ by writing a standard TOC. The disc then conforms to the Red Book standard and can be played on any machine.
An audio disc created on CD-R or CD-RW should conform to the Red Book standard, meaning that it’s identical to a ‘pressed’ CD-A. disc. Unfortunately, CD-RW technology is significantly different to that used in older drives, meaning that CD-RW discs may not work in some CD or CD-R drives.
By convention, audio CDs comply with the Red Book standard. However, anyone with a computer can create much larger albums by making CDs containing MP3 files. Such discs, created on CD-ROM, CD-R or CD-RW, have a normal CD-ROM data structure, allowing them to be used in any computer’s CD drive (although there can be problems with CD-RW discs, as mentioned above). However, they can’t be played on an audio CD player unless it’s compatible with MP3.
MiniDisc is a proprietary format developed by Sony, although other manufacturers make compatible devices. Although a disc can accommodate 74 or 80 minutes of recording, the same as a normal CD, it’s only 21⁄2 inches (64 mm) in diameter and is encased in a rectangular shell. Pre-recorded MDs are pressed in the same way as CDs whilst recordable MDs use magneto-optical (MO) technology.
MDs have a lower storage capacity than a normal CD, requiring a more advanced form of audio compression, involving perceptual coding and a bit rate of 300 kbit/s for stereo recordings.
DAT, which uses a small tape cassette, is also known as R-DAT, since the record and replay heads are on a rotating drum, as in a video recorder. A typical DAT recorder has all the features found on any recording machine, but with added extras, such as the use of start IDs or (in some models) the ability to generate and to be controlled remotely via timecode.
SMPTE timecode, which provides information about the time of day, can be used to synchronise the sound output of a DAT machine to other recording machines and video equipment. Timecode is usually recorded using DAT’s subcode area, which isn’t used by a non-timecode machine.
A common reference frequency must be used to drive a DAT machine’s digital audio sampling frequency and its timecode input. If not, the timing of the recorded timecode and the sample rate can begin to drift apart, causing some digital equipment to reject the DAT recording.
The best way to ensure timing accuracy is to record timecode generated by the DAT recorder itself at the same time as recording the audio material. However, if the recipient of your DAT recording is only using the analogue audio connections on their DAT machine, it’s possible to achieve a reasonable timing ‘lock’ for up to 30 minutes. In this situation you can use an external source of timecode or you can stripe your DAT tape with timecode after making the audio recording.
A modern 4-head machine with off-tape monitoring can work in three modes:-
Assemble: both audio and timecode is recorded.
Audio: only audio is recorded, leaving timecode intact.
Subcode: only timecode or other data is recorded, leaving audio intact.
Insert mode this kind of machine can read timecode or audio from the DAT and then replace it.
A-DAT is an 8-track recording system that employs S-VHS video cassettes. Standard VHS cassettes can be used, although the results aren’t always satisfactory.
Prior to recording, or whilst ‘laying’ the first track, the tape must be formatted. This places two minutes of data at the start of the tape, stripes the tape with a form of timecode and divides each ‘wipe’ of the helical scanning head into the separate sectors for each track. It’s best to format an entire tape in a single ‘pass’, so as to avoid any discontinuities in timecode.
When the machine is used on its own you can employ track 8 for SMPTE timecode, which can be added to the tape whilst formatting. However, if you use a full-function remote control, which accommodates both SMPTE and MIDI Timecode (MTC), you can use this track as normal.
For best results, you should take great care with signal levels when recording to CD-R. Remember, the dynamic range provided available on a CD closely matches that of the human ear. In addition, any overloading of a digital input can create distortion of the very worst kind.
With analogue source material you should always use an audio limiter. Typical settings for a suitable device using professional line level signals are given in the table below:-
|Attack||Minimum (0.15 ms)|
|Release||Minimum (0.015 s)|
|Voice over (VO) Controls||Minimum|
|Voice over (VO) Switch||Off|
|Stereo Link Switch||On|
|Meter Switches||‘Gain Reduction’ & CH 1 + 2 ‘On’|
The diagram below shows the relationships between various signal levels for DAT and CD-R, as well as the values shown on a peak programme meter (PPM):-
+7 dBfrom a domestic player, although a broadcast player can peak to
+22 dB. To match the levels from commercial CDs to signals from other studio equipment the output of a professional player is often backed-off by around 10 dB. Hence during line-up, the CD player gives
-12 dB, corresponding to
1on a PPM,
-30on a DAT machine’s meter and
-24on a CD-R recording meter.
DAT recordings are often unsuitable for digital transfer to CD. This is because DAT users often allow headroom between the material’s peak level, corresponding to the normal maximum level, and clipping level, the point at which distortion appears. For example, radio broadcast material often has 10 dB of headroom whilst a programme interchange recording can use 6 dB.
However, CD-Audio discs rarely have any headroom, with the peak level taken very close to the clipping level. This means that any DAT transferred digitally to CD will normally be at low level, typically by 6 to 10 dB, or even worse if the original recording was made at a very cautious level.
There are two ways around this problem:-
The following procedure describes how to make the DAT recording, assuming you have a mixing desk with a peak programme meter (PPM) and a test tone oscillator set to 1 kHz. The mixer should also be able to check the DAT machine’s output level.
PPM 4, equal to
0 dB. Turn up the DAT machine’s input control until its output level is
PPM 6, corresponding to
+8 dB. Don’t worry if your machine lacks an off-tape monitor facility; this procedure works with any kind of DAT recorder.
PPM 7, equal to
+12 dB, it should almost fully deflect the DAT machine’s own meter. The input control should be backed-off if the clip indicator on the machine starts to glow.
Each track on a CD is identified by a track ID number, which is linked to the information contained in the disc’s TOC. This number usually relates to the track list shown on the sleeve of the disc.
To provide fast access, all the tracks must be correctly identified. If care isn’t taken during the creation of a CD you’ll have one huge track, making it almost impossible to locate one part of the recording. Track IDs should always be provided, even if the CD actually contains a continuous recording.
A CD-R or CD-RW drive and associated system can create track IDs by the following methods:-
-60 dBor less for more than three seconds. However, this mustn’t make the IDs too ‘tight’, otherwise the CD will start playing after the start of the track, cutting off some of the sound. These tight IDs can be prevented by introducing a delay circuit into the audio path.
Some CD-R drives can insert skip IDs onto a CD-R disc. This effectively ‘deletes’ unwanted tracks by instructing any CD player to ‘skip’ them during playback.
©Ray White 2004.