Part 3 in a series: After decades of continual improvements in audio recording and playback technologies, the average 21st Century listener will hear most music at lower fidelity than before the millennium. How did this happen, and can anything be done about it?

In Part 1, we reviewed some of the reasons for the general decline in audio quality.  Part 2 explored the differences between analog and digital audio.  Now it’s time to take a closer look at the various digital formats, do some math, and find out what high fidelity means in the 21st century.

The resolution of a digital audio file is based on three factors: the sample rate, bit depth, and bit-rate.

Sampling Rate (sample rate, or sampling frequency) – the number of times audio is measured (or sampled) per second.  The standard sampling rate for a CD is 44.1 kHz, which means that the audio is sampled 44,100 times per second.

An analog signal (light bue) measured over time at a fixed sampling rate (red).

The sampling rate determines the frequency response — the range of sounds from low to high that a file is capable of reproducing.

Imagine a felt hammer striking a piano string.  A string that is thinner, shorter, or under more tension will vibrate more rapidly, producing a higher frequency or pitch.

The lowest note on a piano (A0) has a frequency of 27.5 Hz, and the A above middle-C (A4) has a frequency of 440 Hz, or 440 cycles per second.  The highest note on a piano is C8,  at 4186.01 Hz, and the normal range for human hearing is from 20 Hz to 20,000 Hz.

According to the Nyquist-Shannon sampling theorem, a sampling rate needs to be twice the frequency of the signal being sampled in order to accurately capture the sound.  So the 44.1 kHz sampling rate used for CDs should be sufficient to capture audio frequencies up to 22,050 Hz, beyond the range of human hearing.

However, some maintain that inaudible frequencies above 22,000 Hz can “color” the sound and affect the lower-range frequencies we do hear.  And the digital recording process can produce distortion through the aliasing of these higher frequencies that requires filtering to correct.

There is no theoretical frequency limit for an analog signal, but the physics of audio reproduction place a practical limit on what can be achieved.  Testing on some analog systems has shown evidence of frequencies up to about 50,000 Hz.

Most professional digital recordings are made at a sampling rate of 96 kHz, so they can capture the high-end audio frequencies that might be found in an analog recording but would be missing from a CD.  And if you absolutely must go higher, you can use a sampling rate of 192 kHz.  At this rate, you are slicing each second of audio into 192,000 pieces, and capturing frequencies up to 96,000 Hz (ouch!).

Bit-Depth -the number of bits used to record each slice of audio.  Think of this as the number of levels available to capture each slice.  Every bit doubles the number of levels: the resolution for 16-bit audio is calculated as 216, giving you 65,536 possible levels.  24-bit audio is calculated as 224, providing over 16 million levels.

Most professionally-recorded digital audio is 24-bit.  Recording at a higher resolution allows for a greater dynamic range (the difference between the softest and the loudest sounds in a recording) and a better signal-to-noise (S/N) ratio (i.e., more signal, less noise).

Sound levels are commonly measured in decibels (dB), and the normal range of human hearing is from 0 dB (threshhold) to 120 dB (hearing damage).  The dynamic range for 16-bit digital audio is 96 dB, and the range for 24-bit audio is around 144 dB. The best most analog formats can offer is a dynamic range of around 60 dB, and there will almost always be more noise present.

By the time all of these calculations end up as air moving from your speakers, compression may have obliterated some of the differences in dynamic range between the formats.  A lot depends on the type of music you listen to — look for an update on the “Loudness War” in an upcoming article.

Bit rate – the number of bits processed per unit of playback time.  For an uncompressed digital audio file, this can be calculated as:

Sample Rate x Bit-Depth x Number of Channels = Bit Rate

Let’s do the math for a CD: 44,100 x 16 x 2 = 1,411,200 bits per second (or 1411 kbps, or 1.4 Mbps).  Compressed audio, such as an MP3 file, is a different story.  The sampling rate for an MP3 file can vary, and there is no equivalent bit-depth, so the bit rate is an indicator of how much compression was applied to the original signal.  A higher bit rate results in a larger file size and greater fidelity to the original sound.  Since a CD has about 11 times the bit rate of an MP3 file, does that mean it sounds eleven times better?

How high is up?

Let’s think about this for a minute.  Higher sample rates and greater bit-depth will result in more information being captured for each sound.  Higher resolution means better sound, but there are limits.  Our ears impose limits: the highest frequencies we can hear drop with age, and some ears are better-trained and more discerning than others.

The recording method and storage media impose another set of limits.  And the playback system comprises a long chain of limiting factors: the playback unit, audio circuitry, DAC, amplifier, wiring, speakers, and more.  The rooms we listen in, and where we sit in those rooms can have a dramatic impact 0n the quality and accuracy of the music we hear.

T Bone Burnett prefers analog, but maintains that if we have to listen to digital audio, we should do so at a minimum resolution of 96 kHz/24-bit. There is a fair amount of controversy over sampling at higher rates, with some engineers and audiophiles claiming that 192 kHz audio is a gimmick, overkill, or “just stupid”.  From one detractor:

Sampling audio signals at 192KHz is about 3 times faster than the optimal rate.  It compromises the accuracy which ends up as audio distortions.  There is an inescapable tradeoff between faster sampling on one hand and a loss of accuracy, increased data size and much additional processing requirement on the other hand.

The optimal sample rate should be largely based on the required signal bandwidth. Audio industry salesman have been promoting faster than optimal rates. The promotion of such ideas is based on the fallacy that faster rates yield more accuracy and/or more detail. Whether motivated by profit or ignorance, the promoters, leading the industry in the wrong direction, are stating the opposite of what is true.

~ Dan Lavry, “Sampling Theory for Digital Audio”

While looking at the above chart, remember that we are comparing apples (uncompressed audio files such as those on a CD) and oranges (compressed files).  While a CD track may contain 11 times the information in a 128 kbps MP3 file, it’s not really a fair comparison.  The compression algorithm is designed to throw away the unimportant and mostly inaudible parts of the music, it doesn’t just randomly remove 90% of the data.

What is HD Audio, and how do I get it?

So the CD might not sound 11 times better, but it definitely sounds better — MP3 files are a step backwards from CD-quality audio.  There are a few competing definitions and formats, but for our purposes, High-Definition (HD) audio will be defined as audio formats that exceed the sampling rate and bit-depth (44.1/16) of the Red Book CD Standard.

There is a high-definition audio specification from Intel for PC audio up to 192 kHz/32-bit for two channels, and 96 kHz/32-bit for as many as eight channels.   But this spec supports sample rates as low as 6 kHz, as well as 8 and 16-bit audio, so it falls outside of our definition.  Oh, and then there’s HD Radio, which has nothing to do with high-definition audio.  HD originally stood for “Hybrid Digital”, and now is just part of the HD Radio trademark and stands for nothing.

We will take a deeper dive into the competing formats for HD Audio in our next installment, and look at the various ways and means to get high-fidelity in the 21st century.  Onward!