In this article (http://arstechnica.com/apple/news/2012/04/does-mastered-for-itunes-matter-to-music-ars-puts-it-to-the-test.ars) Chris Foresman concludes erroneously that a 44.1kHz sampling frequency with 16-bits of resolution is plenty to faithfully reproduce the spectrum of audible sounds “good enough” for most people. The sample rate of CDs is underpowered, because it should be at least comparable to the limits of human hearing. At 44.1kHz the digital reproduction of a high frequency signal is nothing like the analog original.
Instead of arguing about it, here’s a simple visual example to show what is really happening. At 44kHz a 20kHz sound wave can be sampled just over 2 times. So here’s a simple sine wave figure 1-1 through 1-4. The blue lines represent 1 cycle, and the red dots within that cycle signify the sample points: (Excuse the crudeness of the drawings, I haven’t owned the Adobe suite in years.)
a plain sine wave
a 20kHz sine wave sampled 2 times
a 10kHz sine wave sampled 4 times.
a 5kHz sine wave sampled 8 times
You have ~2 points to use to describe the first 20kHz wave, and both must be the exact same distance apart on both sides with a wrap around figure 1-2. At 10kHz you have ~4 points to use, same limitations figure 1-3, at 5 kHz you have 8points. Then connect the dots in a straight line to see what the resulting sample looks like. We can see that at 8 samples per second at 5kHz the wave starts looking like the sampled wave.
Even if the exact same frequency is sampled, it will take a millisecond for it to hit all points along 20kHz in order to reconstruct the original signal at the 5kHz rate as the sampler’s sample point changes about 10% each cycle. There is not much music or audio that has more than a second at the same frequencies so the DSP has to be fast. Then the DSP has to correct the waveform, but it can only guess really, and it has to do it very quickly. But luckily that guess is pretty damn good. But pros call that distortion. This is only at higher frequencies, but frequency masking can occur and pollute the output of the sampler. I order to avoid that, more processing has to be added.
So if we are to faithfully reproduce even 10kHz sounds that are well within the most people’s hearing range, we would need at least 80kHz and 160kHz for 20kHz. That is why professionals use 192kHz. The author is also unaware (or ignores the fact) that filters are commonly applied to frequencies above 20kHz to prevent ultrasonics from distorting audio signals. But they can only do so much.
Some quote the Shannon-Nyquist theorem and say people like me do not understand it and that we are arguing with scientific fact. The conditions necessary for the theorem to be true are ideal conditions. “Ideal Conditions” means conditions where there are no random variables and everything in the environment is controlled. In fact, filters cannot reduce frequencies above 20kHz to zero, they can just attenuate them greatly, minimizing their impact on the sampling. Given that fact, sampling at twice the lowpass frequency means you get much more errors. The sampling frequency of 44.1kHz was chosen as a nod to this problem as a sort of compromise. If you have a 20kHz signal with 22kHz coloring because the low pass filter has a 2kHz slope.
Basically, There are no ideal conditions in the real world, and anyone that has studied analog to digital conversion knows that twice the sample rate of the highest frequency is “good enough.” However that does not mean what is reproduced is equal to original signal. You can do all sorts of processing to recover the signal, but you can never get back the analog original.
24-bits: Good; 144dB Ouptut: Bad
24-bit audio is not extraneous either. 96dB is well within human hearing range as well. People without hearing problems can listen to approximately 120dB for brief periods of time with no hearing damage. Is it safe to go above 120dB or listen at volumes above 85dB for any extended period of time? Of course not. Would the “loudness wars” cease to be important because everyone could be as loud as they wanted, and could the compressed and boosted audio be more intelligible? Yes. One would not need to utilize the entire spectrum, but as Chris stated: “headroom” is important, and not just to professionals. The point being, If you have 144dB to play with does not mean you need to use it. However it does mean you do not need to compress the input signal as much, which equals lower levels of distortion and less processing.
As Processing Goes Up, So Does Distortion
Every level of processing outside of the digital realm adds noise—that’s just the way electrical circuits work. So the idea is to record with as little processing as possible (usually just light compression to avoid clipping), then process the audio non-destructively. However, once the sound is digital there is no going back. Once it is digital you can do all sorts of things except fully the analog source. Upon output, the signal again is converted and processed. The processors have gotten better and cheaper, but this whole conundrum could be mostly avoided by simply updated the baseline sample rate and bit depth to match today’s digital abilities.
The File Size Problem
However doing this would increase storage demands many fold. 150% for 24-bits multiplied by 4 for 192kHz. So a 70 minute album would need approximately 4.2GB of storage space uncompressed (about the size of a single layer DVD). Luckily with 3:1 lossless compression, we could get this down to about 1.5GB.
I myself have just under 600CDs I bought ripped on my laptop, so I would need over 900GB on my HD to store lossless audio at DVD-Audio quality. I actually have a 1TB HD in my laptop, but that would leave little room for other files and applications. However, I switched to lossless encoding a few years back, and currently have over a month of music on my machine, but it takes up less than 160GB. I will eventually go back and re-rip everything lossless. I wouldn’t mind a higher dynamic range and higher resolution, but I probably couldn’t hear the difference. Considering many adults cannot hear above 18kHz, a 144kHz sample rate would be fine, but even then it would probably be overkill considering so few sounds are above 16kHz. So, 128kHz would be a reasonable compromise. So instead of 6 times the storage needed, it would only be about 4 times or ~2.8GB uncompressed, and ~1GB with lossless compression. That way my audio collection would only take up just under 600GB, which would be perfectly acceptable to me.
Back when 1GB was a huge drive and processors were still rated in the MHz, of course we traded quality for quantity. But these days, there is little reason to cling to CD-A standard, unless you are bandwidth conscious.
Bandwidth is a Bottleneck
The thing is, it isn’t storage needs controlling quality, it is a bandwidth bottle neck. Many people are lucky if they get a consistent 1MB/s, and it would take over 15 minutes for someone to download an album at this compromise rate of 24-bit/128kbps. That’s within reason, but most people have bottom of the barrel service, and only get 128–340KB/s downstream. So, that one album would take between 45 minutes and 2 hours & 15 minutes. Ouch! So, until ISPs can increase the baseline to at least 1MB/s at reasonable prices, audio quality (and video quality) will be stuck in the 20th Century.
But What About the AES Study?
The Audio Engineering Society did an experiment asking engineers and audio students to listen to both Red Book (CD quality) audio and 192k/24b audio. About 50% could not tell the difference. They concluded that CD quality is fine for most applications, but left the door open for audio standards to increase as technology improves. But it is clear that in order to reap the benefits of higher quality audio one must have professional level equipment in controlled listening environments.
“Watch waveforms on an O-Scope & Reconstruct the Audio in Your Mind”
Someone on Ars, made that humorous comment, but they actually hit on something true. Music is only heard after being registered consciously in your mind. Coloring/Distortion of the audio can cause all sorts of subtle response changes people consciously or subconsciously react to. One does not need to be a professional engineer to be affect by this, either. People know when something sounds like crap instinctively unless they have hearing damage. But few people recognize the effect of their reception of music.
I have noticed that my enjoyment of music can be heavily impacted by the quality of the reproduction. Lower quality audio simply makes complex music I haven’t heard before sound less than thrilling, muddy and boring, while on my own system, that has low distortion and decent speakers, the exact same source sounds beautiful and dynamic. Whether I am conscious of the quality of not, is not the issue for me, but the overall result. Do I enjoy music more at higher quality? Yes. And being a music lover, that’s enough reason for me to want higher quality music released more widely than it is now.