I cannot hear the difference between 16/44.1 (and by extension, 16/48) and High-Res Content generally, be they HDCD, SACD, or just straight-up Masters from Qobuz. This is on multiple sets of equipment, ranging from El Cheapo earbuds all the way to HD800 cans and full-fledged tower speakers being bi-amped.
That’s not why I go for High-Res stuff, though.
It’s all about archival, at least for me. With a 24/192 Master in FLAC or ALAC, I can downsample to whatever the destination form factor is. I can transcode to a 320kbps MP3, or a 16/48 WAV stream for a smart speaker, or a 24/96 stream for the theater. The point isn’t that I can hear the difference, it’s the fear that I might lose something irrecoverable by sticking with lower-quality files for bulk storage. Once data has been discarded, it cannot be retrieved, and that influences my preference for storage (and is also why my BD/UHD rips are into MKVs, no re-encoding).
Now that being said, I will absolutely hem and haw and ABX different releases to determine if I opt for the 16/44.1 CD rip of an album from the 80s or the new 202X remaster in 24/192 (spoiler: almost always the former), and I absolutely prefer anything with classic instruments (Jazz, Classical) in higher-quality formats because of a subjective perception of a wider, clearer sound stage, though this is almost certainly a psychological effect from performing in concert bands and orchestras rather than physical or objective in nature.
Like I tell newcommers: if it sounds better enough to you to warrant the purchase price, then that’s all that really matters. Enjoy the hobby.
Decades ago, I was treated to an ABX test in my brother's recording studio. I easily recognized and preferred a 24/192 master he played versus the 16/44.1 down-mix. I honestly don't know whether there was something wrong with the down-mix, but qualitatively it did feel like it was "muffled" and coming from speakers, while the master really felt like live performance. He was surprised that I could tell them apart.
I also spent a lot of time ripping my old CDs to FLAC and trying different MP3 and AAC encoder settings to get playback that felt transparent enough to me. I could never tolerate Sirius/XM radio streaming due to the horrid compression I heard with every futile attempt. I still seem to have more sensitive hearing than most people around me, but in my 50s I know it isn't what it once was.
I never had huge budgets, but did strive for hi-fi in my limited ways. I used things like toslink and HDMI to send raw PCM data from Linux to my Yamaha A/V receiver's DACs + amplifier to drive somewhat nice Polk tower speakers. But then COVID-19 happened, and this stuff was packed up to move house.
Nowadays, music playback is streaming with mundane "subwoofer + satellite" PC speakers or MP3 playback with a mini-SD card permanently parked in my car's infotainment system.
And that's fine! I've got a flatmate who loves 320kpbs MP3s on studio monitors, I've got musician friends who swear by CD-audio and Sennheiser HD200s, and others who love how vinyl uniquely degrades over time on big speakers.
The takeaway from these sorts of posts, at least in my opinion, should be two-fold:
* Understand the physical limits of human senses and perceptions to help inoculate yourself against outright scams and grifts
* Liberate you from the "tech grind" and allow you to enjoy what you like, how you like it.
The article says "I've run across a few articles and blog posts that declare the virtues of 24 bit or 96/192kHz by comparing a CD to an audio DVD (or SACD) of the 'same' recording. This comparison is invalid; the masters are usually different."
It may be simultaneously true that:
A) Humans cannot tell the difference between 44.1kHz/16-bit audio and any higher resolution, and
B) For a particular song, the best commercially available 44.1kHz/16-bit version may not be the best commercially available version
"The quality of the particular mastering can still make a noticeable difference, regardless of the ability for the digital sampling rates to perfectly represent it perceptually"
Just to be clear that the statement applies to any releases meeting the A) criteria, not just 44.1 kHz @ 16-bit ones.
That’s true, but I consider myself a collector. Think of how a comic book collector operates.
If I have an option to get a 16bit version of a recording or a high-res version, I choose the highest quality version very time
Same with a physical copy. A limited edition, better quality vinyl LP is more attractive if you are going through the trouble of curating a collection.
I’ve been curating a music library of digital files since before the iPod was released and I will always go for the highest quality version out of principle. I can always downsample it to any thing that makes sense.
This really is driving a muscle/super car, or drinking expensive wine. At the end none of specs or tests matter. It is a form of art. If it makes the listener feel better (even if its just psychological) then its probably worth it.
To expand on this a bit, I appreciate some audio overkill because, if I do hear sizzle or distortion, it eliminates one possible reason and helps me figure out what’s actually happening.
It’s like having gigabit internet to my house: I don’t actually need it, but when a website is slow, I know the problem isn’t in my internet connection.
Correct. I've paid for Tidal for a decade because I just like the peace of mind that it's closer to the original recording. I'm sure it's mostly placebo, but I like it.
It's also sort of an inverted “Van Halen demanding a bowl of M&Ms with the brown ones removed” thing for me, too. The vast majority of my Tidal listening happens over Bluetooth, so that 24bit/192kHz FLAC stream is just gonna get downsampled to 16bit/48kHz anyway because that's all any Bluetooth speaker or headset is capable of doing — but the fact that it's an option in the first place signals that other things are being done right, too (namely: that Tidal's whole “we're the streaming service that pays artists the most per listen” premise actually has some semblance of merit rather than being complete marketing bullshit; while recording quality ain't the strongest signal possible for that, it's certainly a good sign when musicians/publishers are willing to send over the highest-bitrate lossless recordings they've got and not just the same ol' compressed-to-shit MPEG audio you can yank off YouTube for free).
I'd distinguish between differences that anyone can detect but some may not care about, and differences that may not be objectively detectable at all. Muscle cars, at least, are different in a way that anyone can see. Push that pedal to the floor and it feels different from a Honda Civic or whatever. Whether that difference is actually interesting or good is, of course, a matter of taste. Whereas audiophile nonsense is often indistinguishable even to the connoisseur and depends entirely on some form of self-deception. Still could be worth it, depending on what one considers worthy.
That’s actually a really good comparison, especially because - yes I can hear the difference between an excruciatingly lossless digitization of a piece of music that I’m intimately familiar with, played back on expertly configured hardware… but the difference is so little, that most of the time, I’m find just listening to it at medium high quality streaming on a pair of <$50 headphones.
I’ve played with the nice toys, and they are nice, but for 100x the price, they barely deliver 1.5x the experience.
Oh great. And here I thought that fantasy literature where forest elves could hear the screams of the plants they stepped on when they walked was just that -- fantasy.
Music producer here. High resolution audio is useful for editing and anywhere there might be downstream processing or format conversion that may or may not be high quality, let alone lossless. The article covers that pretty well.
However, the article claims that the final distribution doesn’t need to have a bit depth of more than 16. That does not match my experience. I can tell the difference between my renders that are 16 bit vs 24 bit. I cannot tell the difference between 44.1 kHz and higher sample rates, and that’s consistent with the math (Nyquist-Shannon), but bit depth is a different matter. Would be fun to participate in a double-blind test that includes my own tracks and others.
Just get one of those "hi fi" valve amplifiers from Amazon you see under $100. The valve already distorts the sound, so you don't need to bother paying more for low distortion anywhere else in the audio chain. Saved you thousands of dollars, done!
Foobar2000 has an extension that allows you to blindly test whether you can tell the difference between two tracks.[1] The prime use is to compare different encodings of the same song from the same lossless master.
It kind of changed me a bit when I ran through 20 lossless tracks I had re-encoded to various mp3 bitrates and realized that even on a fancy system, it can be really hard if not impossible to discern even moderate lossy from lossless.
If you are an audiophile geek, really think about if you want to try this, the reality check might crack your foundations.
I hate to be the one to break it to you, but high end skis make tradeoffs which are harmful to beginner or intermediate level skiers... also there's sorta no thing as "best ski". what you'd want for high speed bombing double blacks is going to be different from off piste or moguls or snow park fun.... double also, skis wear out. Depending on who you want to believe it's as low as 20-30 days. Which, granted the average skier is at something like 5 days a year. but if that's you... triple also?
As for how this relates to audio compression, in particular in the context of 2012. you are making a tradeoff of storage size and decompression cost. Maybe that doesn't matter to you, but maybe it either did in 2012 or still does.
The point of this article and video is there is no problem with 16-bit 44-kHZ PCM. It thoroughly covers the audible range and is there is absolutely no need for more when distributing music for humans to listen to.
The problem is the people spreading myths and disinformation out of ignorance or to promote their enterprise.
The weak links are producers/mastering-engineers, speakers/headphones and the room when using speakers.
There is a good reason to distribute it though, and compressed it doesn't really change the file size.
There's multiple YouTube channels that I listen to as podcasts, that are professionally created and the creators presume that exported audio works like studio audio, so what you end up with is really quiet audio that can't be turned up without pre-processing.
If we distributed audio the same way we work with it in a studio, we could forgo a lot of problems.
Also, the human ear does have enough dynamic range to make 24 bits worthwhile, though that much dynamic range is rarely used in recordings, and that high of a bit depth provides no benefits within a small dynamic range. A 192 kHz sample rate, on the other hand, is always useless.
Nobody downloads music these days and everybody just streams. Audio at 24 bit still takes a small fraction of the bandwidth that 1080p video takes, so I don’t understand the hate for it.
I use a DAC by focusrite which can do 24-bit, and if I want to listen to higher fidelity audio on my planer headphones then I should be able to. Why should I limit myself to 16-bit
Counterpoint: bandcamp is doing well. Vinyl sales are doing well.
If I like an artist that I find on streaming, I buy an LP and get a lossless download for free. I still have a music library and I will never rent my favorite music.
Artists prefer to connect directly with their fans and BC is probably the best platform for people who care to pay and support acts directly. They have high res downloads and I import them.
I don't think the hate is about people who know it doesn't actually sound different if the audio file is 16 bit or 24 bit or necessarily about receiving a few more bytes than they need, it's about the pushes by these types of streaming services/offerings or people insisting that it's supposed to be any better for listening when it's not.
Also the playback rate and the file rate are different topics. The former can get into scenarios more like the audio processing section of the article e.g. I had this one shitty headset for work which required me to set the volume to 1-2 (out of 100) on the computer and I could actually blind test tell when it was in 16 bit or 24 bit mode because it was cutting and boosting it so much it effectively lost precision in 16 bit mode.
My good enough amplifier and DAC combo claims up to 24bit/192kHz, I use a cheap optical interface from my computer that claims up to 32bit/192kHz, and the streaming service I use serves most albums at 24bit/44.1kHz.
It would have cost the same for the entire stack to be 16bit/44.1kHz at every step, but with excessive resolution I can control the volume anywhere. The bits right before the analog conversion at the end are essentially the same whether I turn down the volume in the software player, the operating system, or the DAC/amplifier.
Some people have claimed to hear an improvement with an external clock on a Wiim Ultra, but I do not think it is possible to re-clock the WiiM Amp Ultra with an outboard clock.
When I play from the computer, I'm not sure whether it is using the clock on my Mac, the clock on the optical interface, or the WiiM's clock. However, I do not notice any difference in fidelity when I use the Qobuz software player on my Mac or use Qobuz Connect to allow the player to directly stream from the source, so either it isn't a difference that I can hear, or the WiiM's internal clock is used for both sources.
I'm curious if the audio was being sent bit-perfect to the DAC for all of these tests (ALSA direct), or if it was being run through the audio mixer and being resampled
I can always tell if my 44.1 songs are being resampled to 48 because they're being run through the OS mixer
I'm also one of those audiophile crazies that obsesses over which metals to use in cabling, power filtering, swapping opamps, and builds their own DACs, amps, and speakers
"proper" resampling was expensive in 1997 when Intel was introducing fixed sampling AC'97, but was below noise floor of CPU load meter in 2007 when Microsoft released Vista killing hardware mixing.
@xiphmont also made an amazing video response to the many responses he received to this article. Using analog equipment he busts a bunch of myths and demonstrates what really happens with digital audio.
The main benefit for me is that digital watermarking becomes completely inaudible with high-res audio, but I can sometimes clearly hear it in standard resolution.
At a minimum, anything above 16/44.1 requires far more than just files: monitors, a treated room, listening position, DAC, etc... but most importantly - a trained ear. That last one is the most uncomfortable truth.
As I responded below, you are confusing math with physical reality. A true 44.1 kHz converter can't realistically capture frequencies ~18-20 kHz due to the limitations of filters used in the process. A perfect lowpass brick-wall filter just does not exist - they all introduce artifacts, which a trained ear can identify. You don't need to be a dog to hear the difference, just someone who does not assume that Nyquist theorem can be magically applied in the real world (and, ideally, someone who utilizes high quality converters with oversampling).
I don’t have great hearing, so I’m not sure I can really weigh in here (thanks punk concerts in my teens). I remember similar arguments around screens and 60Hz vs ‘the human eye’. I think a lot of people, myself included, can easily perceive the difference between 60Hz and something higher- given the right conditions. I would not be so quick to disregard claims of more sensitive hearing.
(I responded on this topic in this thread already) Look up articles on practical limitations of AD/DA converters and why a seemingly counter-intuitive claim that the difference between 44.1 kHz and above is noticeable is actually a completely accepted practical reality (aliasing, lowpass filters, etc)
The human threshold-of-hearing curve intersects the threshold-of-pain curve at about 20 kHz.
Above that frequency (or thereabouts) the sound has to be so loud that it will literally instantly damage your hearing before you can hear it.
This has been replicated across many studies for more than 100 years.
Flicker threshold is completely different. You can’t damage your vision by increasing the FPS, and it has always been commercially desirable to use a lower frequency because that is cheaper.
Max representable frequency is half the sampling rate (nyquist-shannon theorem), which is still a bit above normal but IIRC the extra headroom has something to do with eliminating aliasing
The artifacts produced by pure 44.1 kHz convertion are aliased back down to lower frequencies. It's not about a theoretical human ear, it's about the actual physics of AD/DA conversion.
Sure, but those are averages. I'm 30-ish, and my hearing doesn't cut out until somewhere in the 21kHz range. When I was younger, it was even higher. One of my roommates in college had one of those anti-rodent high-frequency noise generators, we almost came to blows over it.
You need at least twice the frequency range for sample rate in order to represent the original signal. That's slightly misleading though, that's from the Nyquist-Shannon sampling theory and it's a mathematical fact but that is true for exact numerical samples, once you add in quantization that muddies the water a bit. Taken at the extreme, it's straightforward to see why a 1 bit quantization per sample at 44.1 kHz would not capture a perfect representation of some analog signal even if there's only a 1 kHz frequency component to the signal. If we instead decide to sample at 10 MHz but still one bit quantization, now that 1 kHz frequency component can be much more accurately represented even though we're still using the worst quantization possible. Don't think of quantization like a square wave or a step pattern, think of it as "the signal is closer to here than any other discrete value".
Now in terms of realistic audio encoding, 16 bit at 44.1 kHz is designed to be a faithful representation as far as human hearing is concerned. Can someone with a trained ear potentially tell the difference between that and 24 bit at 192 kHz? In a studio environment it's possible. Most audiophile claims are dubious and a blind A/B test catches them out on most of it but the Nyquist-Shannon sampling theorem does not directly apply to quantized samples, it's about exact samples and with quantization, sampling rate is intertwined somewhat with the quantization depth.
If you want to hear the difference between an audio file recorded at 44.1 and 88.2kHZ, then you need slow the audio playback down. Otherwise, a trained ear cannot physically hear the difference.
44.1 is "enough" only in theory. This assumes a physically impossible steep filter. Realistically, frequencies around 20 kHz will create audible artifacts (aliasing). So yes, a trained ear can tell the diffrenece between 44.1 and even 48 kHz. Like many other commenters in this thread, you are mixing up math theory with physical limitations of AD/DA converters. Oversampling is a common way to address this limitation, but strictly speaking 44.1 kHz is not as obviously "enough" as it seems.
The most impactful for noticing the difference? Again, I would argue it's the trained ear. If you have plenty of mixing experience then all these details add up, and a treated room becomes the most critical - agree with that.
Pretty good analogy. Thing is though, the person who receives the 16-bit, 44.1khz music file can always upsample it to 192khz and not lose anything in the process (heck, lots of audio stuff oversamples internally to this level or beyond, for extra aliasing headroom!). I'm not sure about expansion from 16bit to 24bit though, downward expansion isn't necessarily perfect.
The whole audiophile industry is built on stuff which doesn't make any sense
My favourite: "audiophile-grade" audio players which allocate a single continuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented audio" causes audible "jitter".
Of course, they don't know that what looks like continuous memory to user-code is probably discontinuous in kernel/physical RAM.
Didn't check in many years, I wonder if they created kernel level players to account for that, to have "true continuous memory"
> My favourite: "audiophile-grade" audio players which allocate a single contignuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented memory" causes audible "jitter".
Thanks for the laugh... this is absolutely bonkers. In case anyone is wondering, before sound hits our ears it has to go through a digital to analog conversion, which takes place on hardware independent of the CPU, operating with its own clock and buffers etc.
In addition to that, while it is possible to hit a delay and run out of buffer because memory access is slow (the most obvious would be if the input got swapped to disk at an inopportune moment), but the audible effect is really obvious. This isn't some subtle "oh my music sounds ineffably worse" effect, it's "my computer is glitching and my music is unlistenable."
If you try to use empiricism when it comes to certain groups audiophiles, you are going to be sorely reminded that it's basically the equivalent of healing crystals for a different type of person. 24/192 is useful for mixing/mastering, but completely unnecessary for the end product to distribute for listening.
24/192 is also great for digital synthesizers--if you're generating a waveform like a sawtooth that has theoretically instantaneous transitions, they can eat as much frequency as you can give them. Running at 44khz loses noticeable high-end content.
Most modern digital synths have already caught onto this and run internally at much higher sampling rates even if their output gets downsampled, but sometimes you run across a vintage plugin that runs at the host audio rate and working in a higher sampling rate is audible.
You can generate perfect band-limited sawtooth waves at 44.1khz, there are multiple techniques for doing this and most production digital synthesizers use them.
Oversampling gives you headroom for aliases for the rest of the synth that is more vulnerable to it.
Yeah, I was oversimplifying a blit, the raw waveforms are usually okay, but I distinctly remember old-school VSTs where you couldn't achieve a nice saw lead at 44.1.
It's tough to tell without specific names, but I imagine a lot of particularly old* VSTs were written to use naive sawtooths rather than perfect band-limited ones, which would have terrible aliasing at 44.1 khz. Oversampling those would help a lot!
* Some people are still making this mistake, despite information on the (many) ways to do it the right way being widely and freely available!
I wonder if there's also distortion or ring modulation stages where some of the energy above hearing range might spill into audible sidebands if they're not nyquist-limited first.
Yeah, that's the "rest of the synth" part that's more vulnerable to aliasing.
There's some ways to do band-limited distortion but...they aren't nearly as widespread, easy, or universal as band-limited oscillators.
Ring modulation is funny though because you'd ideally want the sidebands to modulate down by default rather than filter them out, that's why you're using it.
32-bits are great for recording too because they do an incredible job of capturing the dynamic range without having to be precise on the preamp settings. It removes an entire job from the recording workflow.
192 for mixing and mastering can be useful especially if you're doing a lot of effects, especially anything that pitch shifts. But I've seen low quality phone-microphone recordings make it to the master; if you capture lightning in a bottle, it hardly matters what the settings were, what the microphone was, or anything else.
We had a really nice crystal decoration that I happened to put on top of one of my TV speakers and, wouldn't you know it, it had this resonant frequency somewhere around specific human speech frequencies that drove us absolutely bonkers until I figured out the cause and moved it.
I completely accept that human audition has limits that are easy to determine by playing a pure sound. But is it the same with music, where multiple frequencies are played and interfere with each other? Aren't some harmonics or effects created by these "inaudible" frequencies?
To try to imagine something similar: the human eye is unable to see UV light, yet fluorescent paint has a visible quality of its own compared to "normal" pigments.
32-bit float has become popular in filmmaking/field recording equipment lately because, with a microphone preamp that supports it, you can capture the entire dynamic range of the microphone--there's no accidental clipping if you drive the gain stage too hard.
It's a bit redundant for a skilled technician, they're already used to setting the gain staging, inbound compression, and feathering the mics to avoid this in 24-bit, but if you're handing a boom mic to a novice and have a scene where e.g. someone's whispering and another person's screaming, it can be nice to not have to worry about it.
sheeesh , measly 24-bit/192kHz
of course it makes no sense, unless it is downloaded through low oxyegen wire, which somehow and unfathomably, must have been omited or forgotten.
For typical listening (though humans can perceive bone-conducted vibrations up to 100 kHz or even 120 kHz) 16-bit-fixed/44.1kHz is a high-fidelity transport format. As a DSP researcher, I prefer 32-bit-float/44.1kHz as a transport format. I often upsample to 32-bit-float/188.2kHz or even 32-bit-float/192kHz for signal processing applications such as high-fidelity reverberation via direct and FFT convolution. While the author advocates for the transport to ear use case, I would argue that 24-bit/192kHz provides greater fidelity and resolution for sound processing. I found the pedantic arrogance of the author to be annoying. But yes, the sampling theory is an important consideration -- but so is the quality of the actual digital filters used in the DAC->ADC pipeline. They are much more forgiving and less lossy at 192kHz.
I cannot hear the difference between 16/44.1 (and by extension, 16/48) and High-Res Content generally, be they HDCD, SACD, or just straight-up Masters from Qobuz. This is on multiple sets of equipment, ranging from El Cheapo earbuds all the way to HD800 cans and full-fledged tower speakers being bi-amped.
That’s not why I go for High-Res stuff, though.
It’s all about archival, at least for me. With a 24/192 Master in FLAC or ALAC, I can downsample to whatever the destination form factor is. I can transcode to a 320kbps MP3, or a 16/48 WAV stream for a smart speaker, or a 24/96 stream for the theater. The point isn’t that I can hear the difference, it’s the fear that I might lose something irrecoverable by sticking with lower-quality files for bulk storage. Once data has been discarded, it cannot be retrieved, and that influences my preference for storage (and is also why my BD/UHD rips are into MKVs, no re-encoding).
Now that being said, I will absolutely hem and haw and ABX different releases to determine if I opt for the 16/44.1 CD rip of an album from the 80s or the new 202X remaster in 24/192 (spoiler: almost always the former), and I absolutely prefer anything with classic instruments (Jazz, Classical) in higher-quality formats because of a subjective perception of a wider, clearer sound stage, though this is almost certainly a psychological effect from performing in concert bands and orchestras rather than physical or objective in nature.
Like I tell newcommers: if it sounds better enough to you to warrant the purchase price, then that’s all that really matters. Enjoy the hobby.
Decades ago, I was treated to an ABX test in my brother's recording studio. I easily recognized and preferred a 24/192 master he played versus the 16/44.1 down-mix. I honestly don't know whether there was something wrong with the down-mix, but qualitatively it did feel like it was "muffled" and coming from speakers, while the master really felt like live performance. He was surprised that I could tell them apart.
I also spent a lot of time ripping my old CDs to FLAC and trying different MP3 and AAC encoder settings to get playback that felt transparent enough to me. I could never tolerate Sirius/XM radio streaming due to the horrid compression I heard with every futile attempt. I still seem to have more sensitive hearing than most people around me, but in my 50s I know it isn't what it once was.
I never had huge budgets, but did strive for hi-fi in my limited ways. I used things like toslink and HDMI to send raw PCM data from Linux to my Yamaha A/V receiver's DACs + amplifier to drive somewhat nice Polk tower speakers. But then COVID-19 happened, and this stuff was packed up to move house.
Nowadays, music playback is streaming with mundane "subwoofer + satellite" PC speakers or MP3 playback with a mini-SD card permanently parked in my car's infotainment system.
Even for PC, I recommend some cheap studio monitors.
I can't hear the difference between 128 kbps opus and FLAC.
And that's fine! I've got a flatmate who loves 320kpbs MP3s on studio monitors, I've got musician friends who swear by CD-audio and Sennheiser HD200s, and others who love how vinyl uniquely degrades over time on big speakers.
The takeaway from these sorts of posts, at least in my opinion, should be two-fold:
* Understand the physical limits of human senses and perceptions to help inoculate yourself against outright scams and grifts
* Liberate you from the "tech grind" and allow you to enjoy what you like, how you like it.
The article says "I've run across a few articles and blog posts that declare the virtues of 24 bit or 96/192kHz by comparing a CD to an audio DVD (or SACD) of the 'same' recording. This comparison is invalid; the masters are usually different."
It may be simultaneously true that:
A) Humans cannot tell the difference between 44.1kHz/16-bit audio and any higher resolution, and
B) For a particular song, the best commercially available 44.1kHz/16-bit version may not be the best commercially available version
While 100% true, I'd phrase B) as:
"The quality of the particular mastering can still make a noticeable difference, regardless of the ability for the digital sampling rates to perfectly represent it perceptually"
Just to be clear that the statement applies to any releases meeting the A) criteria, not just 44.1 kHz @ 16-bit ones.
As they say, most people listen to their music with equipment. Audiophiles listen to their equipment with music.
This is perfect, thank you this goes straight into my long-term memory bank.
On a tangent, whenever someone mentions LP sounding warmer or whatever I like to point out that I prefer wax cylinders (a.k.a. phonograph cylinders).
You Edison shill.
That’s true, but I consider myself a collector. Think of how a comic book collector operates.
If I have an option to get a 16bit version of a recording or a high-res version, I choose the highest quality version very time
Same with a physical copy. A limited edition, better quality vinyl LP is more attractive if you are going through the trouble of curating a collection.
I’ve been curating a music library of digital files since before the iPod was released and I will always go for the highest quality version out of principle. I can always downsample it to any thing that makes sense.
This really is driving a muscle/super car, or drinking expensive wine. At the end none of specs or tests matter. It is a form of art. If it makes the listener feel better (even if its just psychological) then its probably worth it.
To expand on this a bit, I appreciate some audio overkill because, if I do hear sizzle or distortion, it eliminates one possible reason and helps me figure out what’s actually happening.
It’s like having gigabit internet to my house: I don’t actually need it, but when a website is slow, I know the problem isn’t in my internet connection.
Well, at least there are objective performance benchmarks on cars, and some of them are okay proxies of performance in motorsports.
https://www.carwow.co.uk/blog/carwow-quarter-mile-400-metre-...
https://en.wikipedia.org/wiki/List_of_N%C3%BCrburgring_Nords...
Correct. I've paid for Tidal for a decade because I just like the peace of mind that it's closer to the original recording. I'm sure it's mostly placebo, but I like it.
It's also sort of an inverted “Van Halen demanding a bowl of M&Ms with the brown ones removed” thing for me, too. The vast majority of my Tidal listening happens over Bluetooth, so that 24bit/192kHz FLAC stream is just gonna get downsampled to 16bit/48kHz anyway because that's all any Bluetooth speaker or headset is capable of doing — but the fact that it's an option in the first place signals that other things are being done right, too (namely: that Tidal's whole “we're the streaming service that pays artists the most per listen” premise actually has some semblance of merit rather than being complete marketing bullshit; while recording quality ain't the strongest signal possible for that, it's certainly a good sign when musicians/publishers are willing to send over the highest-bitrate lossless recordings they've got and not just the same ol' compressed-to-shit MPEG audio you can yank off YouTube for free).
I'd distinguish between differences that anyone can detect but some may not care about, and differences that may not be objectively detectable at all. Muscle cars, at least, are different in a way that anyone can see. Push that pedal to the floor and it feels different from a Honda Civic or whatever. Whether that difference is actually interesting or good is, of course, a matter of taste. Whereas audiophile nonsense is often indistinguishable even to the connoisseur and depends entirely on some form of self-deception. Still could be worth it, depending on what one considers worthy.
That’s actually a really good comparison, especially because - yes I can hear the difference between an excruciatingly lossless digitization of a piece of music that I’m intimately familiar with, played back on expertly configured hardware… but the difference is so little, that most of the time, I’m find just listening to it at medium high quality streaming on a pair of <$50 headphones.
I’ve played with the nice toys, and they are nice, but for 100x the price, they barely deliver 1.5x the experience.
If you can't hear the squeals of the plants [1] in the studio's reception area, are you really getting the full experience of a piece of music?
[1]: https://www.cnn.com/2023/03/30/world/plants-make-sounds-scn
Oh great. And here I thought that fantasy literature where forest elves could hear the screams of the plants they stepped on when they walked was just that -- fantasy.
Triffid music.
What a human centric view. I like my music to scare neighbor's pets.
(2012) https://news.ycombinator.com/item?id=3668310 316 comments
(2014) https://news.ycombinator.com/item?id=8689231 424 comments
(2015) https://news.ycombinator.com/item?id=10520639 228 comments
(2017) https://news.ycombinator.com/item?id=15127633 428 comments
(2019) https://news.ycombinator.com/item?id=19318898 314 comments
Music producer here. High resolution audio is useful for editing and anywhere there might be downstream processing or format conversion that may or may not be high quality, let alone lossless. The article covers that pretty well.
However, the article claims that the final distribution doesn’t need to have a bit depth of more than 16. That does not match my experience. I can tell the difference between my renders that are 16 bit vs 24 bit. I cannot tell the difference between 44.1 kHz and higher sample rates, and that’s consistent with the math (Nyquist-Shannon), but bit depth is a different matter. Would be fun to participate in a double-blind test that includes my own tracks and others.
Just get one of those "hi fi" valve amplifiers from Amazon you see under $100. The valve already distorts the sound, so you don't need to bother paying more for low distortion anywhere else in the audio chain. Saved you thousands of dollars, done!
Foobar2000 has an extension that allows you to blindly test whether you can tell the difference between two tracks.[1] The prime use is to compare different encodings of the same song from the same lossless master.
It kind of changed me a bit when I ran through 20 lossless tracks I had re-encoded to various mp3 bitrates and realized that even on a fancy system, it can be really hard if not impossible to discern even moderate lossy from lossless.
If you are an audiophile geek, really think about if you want to try this, the reality check might crack your foundations.
[1]https://www.foobar2000.org/components/view/foo_abx
Counter: An ultra high bit rate solves the problem and you can stop worrying if it's the weakest link.
You can the focus on other things.
Example: I Bought the best skis possible. Now I know I need to just focus on my skills and not blame the equipment.
I hate to be the one to break it to you, but high end skis make tradeoffs which are harmful to beginner or intermediate level skiers... also there's sorta no thing as "best ski". what you'd want for high speed bombing double blacks is going to be different from off piste or moguls or snow park fun.... double also, skis wear out. Depending on who you want to believe it's as low as 20-30 days. Which, granted the average skier is at something like 5 days a year. but if that's you... triple also?
As for how this relates to audio compression, in particular in the context of 2012. you are making a tradeoff of storage size and decompression cost. Maybe that doesn't matter to you, but maybe it either did in 2012 or still does.
The point of this article and video is there is no problem with 16-bit 44-kHZ PCM. It thoroughly covers the audible range and is there is absolutely no need for more when distributing music for humans to listen to.
The problem is the people spreading myths and disinformation out of ignorance or to promote their enterprise.
The weak links are producers/mastering-engineers, speakers/headphones and the room when using speakers.
There is a good reason to distribute it though, and compressed it doesn't really change the file size.
There's multiple YouTube channels that I listen to as podcasts, that are professionally created and the creators presume that exported audio works like studio audio, so what you end up with is really quiet audio that can't be turned up without pre-processing.
If we distributed audio the same way we work with it in a studio, we could forgo a lot of problems.
Also, the human ear does have enough dynamic range to make 24 bits worthwhile, though that much dynamic range is rarely used in recordings, and that high of a bit depth provides no benefits within a small dynamic range. A 192 kHz sample rate, on the other hand, is always useless.
Nobody downloads music these days and everybody just streams. Audio at 24 bit still takes a small fraction of the bandwidth that 1080p video takes, so I don’t understand the hate for it.
I use a DAC by focusrite which can do 24-bit, and if I want to listen to higher fidelity audio on my planer headphones then I should be able to. Why should I limit myself to 16-bit
Counterpoint: bandcamp is doing well. Vinyl sales are doing well.
If I like an artist that I find on streaming, I buy an LP and get a lossless download for free. I still have a music library and I will never rent my favorite music.
Artists prefer to connect directly with their fans and BC is probably the best platform for people who care to pay and support acts directly. They have high res downloads and I import them.
I don't think the hate is about people who know it doesn't actually sound different if the audio file is 16 bit or 24 bit or necessarily about receiving a few more bytes than they need, it's about the pushes by these types of streaming services/offerings or people insisting that it's supposed to be any better for listening when it's not.
Also the playback rate and the file rate are different topics. The former can get into scenarios more like the audio processing section of the article e.g. I had this one shitty headset for work which required me to set the volume to 1-2 (out of 100) on the computer and I could actually blind test tell when it was in 16 bit or 24 bit mode because it was cutting and boosting it so much it effectively lost precision in 16 bit mode.
My good enough amplifier and DAC combo claims up to 24bit/192kHz, I use a cheap optical interface from my computer that claims up to 32bit/192kHz, and the streaming service I use serves most albums at 24bit/44.1kHz.
It would have cost the same for the entire stack to be 16bit/44.1kHz at every step, but with excessive resolution I can control the volume anywhere. The bits right before the analog conversion at the end are essentially the same whether I turn down the volume in the software player, the operating system, or the DAC/amplifier.
you might want to see if your DAC re-clocks incoming optical, if not then it's relying on the cheap clock generator from your computer
Some people have claimed to hear an improvement with an external clock on a Wiim Ultra, but I do not think it is possible to re-clock the WiiM Amp Ultra with an outboard clock.
When I play from the computer, I'm not sure whether it is using the clock on my Mac, the clock on the optical interface, or the WiiM's clock. However, I do not notice any difference in fidelity when I use the Qobuz software player on my Mac or use Qobuz Connect to allow the player to directly stream from the source, so either it isn't a difference that I can hear, or the WiiM's internal clock is used for both sources.
I'm curious if the audio was being sent bit-perfect to the DAC for all of these tests (ALSA direct), or if it was being run through the audio mixer and being resampled
I can always tell if my 44.1 songs are being resampled to 48 because they're being run through the OS mixer
Proper audio resampling should not be identifiable. Of course, the OS mixer probably doesn't do proper (CPU expensive) resampling.
But a quality audio player should account for this and do it's own.
I'm also one of those audiophile crazies that obsesses over which metals to use in cabling, power filtering, swapping opamps, and builds their own DACs, amps, and speakers
"proper" resampling was expensive in 1997 when Intel was introducing fixed sampling AC'97, but was below noise floor of CPU load meter in 2007 when Microsoft released Vista killing hardware mixing.
@xiphmont also made an amazing video response to the many responses he received to this article. Using analog equipment he busts a bunch of myths and demonstrates what really happens with digital audio.
https://video.xiph.org/vid2.shtml
or on YT if you can't play it https://www.youtube.com/watch?v=cIQ9IXSUzuM
The main benefit for me is that digital watermarking becomes completely inaudible with high-res audio, but I can sometimes clearly hear it in standard resolution.
At a minimum, anything above 16/44.1 requires far more than just files: monitors, a treated room, listening position, DAC, etc... but most importantly - a trained ear. That last one is the most uncomfortable truth.
Are you, per chance, a dog posting on the internet? Since 44.1khz sample rate is already past the range of the human ear, regardless of training.
As I responded below, you are confusing math with physical reality. A true 44.1 kHz converter can't realistically capture frequencies ~18-20 kHz due to the limitations of filters used in the process. A perfect lowpass brick-wall filter just does not exist - they all introduce artifacts, which a trained ear can identify. You don't need to be a dog to hear the difference, just someone who does not assume that Nyquist theorem can be magically applied in the real world (and, ideally, someone who utilizes high quality converters with oversampling).
I don’t have great hearing, so I’m not sure I can really weigh in here (thanks punk concerts in my teens). I remember similar arguments around screens and 60Hz vs ‘the human eye’. I think a lot of people, myself included, can easily perceive the difference between 60Hz and something higher- given the right conditions. I would not be so quick to disregard claims of more sensitive hearing.
(I responded on this topic in this thread already) Look up articles on practical limitations of AD/DA converters and why a seemingly counter-intuitive claim that the difference between 44.1 kHz and above is noticeable is actually a completely accepted practical reality (aliasing, lowpass filters, etc)
I would. It’s really simple.
The human threshold-of-hearing curve intersects the threshold-of-pain curve at about 20 kHz.
Above that frequency (or thereabouts) the sound has to be so loud that it will literally instantly damage your hearing before you can hear it.
This has been replicated across many studies for more than 100 years.
Flicker threshold is completely different. You can’t damage your vision by increasing the FPS, and it has always been commercially desirable to use a lower frequency because that is cheaper.
Max representable frequency is half the sampling rate (nyquist-shannon theorem), which is still a bit above normal but IIRC the extra headroom has something to do with eliminating aliasing
Indeed. And what is the max frequency that a human can hear?
The artifacts produced by pure 44.1 kHz convertion are aliased back down to lower frequencies. It's not about a theoretical human ear, it's about the actual physics of AD/DA conversion.
Depends on age of the listener, on average, 30 to 50 year olds hear a maximum frequency of 14 to 16 kHz.
Right. Which are quite below 1/2 of 44.1k!
Sure, but those are averages. I'm 30-ish, and my hearing doesn't cut out until somewhere in the 21kHz range. When I was younger, it was even higher. One of my roommates in college had one of those anti-rodent high-frequency noise generators, we almost came to blows over it.
You need at least twice the frequency range for sample rate in order to represent the original signal. That's slightly misleading though, that's from the Nyquist-Shannon sampling theory and it's a mathematical fact but that is true for exact numerical samples, once you add in quantization that muddies the water a bit. Taken at the extreme, it's straightforward to see why a 1 bit quantization per sample at 44.1 kHz would not capture a perfect representation of some analog signal even if there's only a 1 kHz frequency component to the signal. If we instead decide to sample at 10 MHz but still one bit quantization, now that 1 kHz frequency component can be much more accurately represented even though we're still using the worst quantization possible. Don't think of quantization like a square wave or a step pattern, think of it as "the signal is closer to here than any other discrete value".
Now in terms of realistic audio encoding, 16 bit at 44.1 kHz is designed to be a faithful representation as far as human hearing is concerned. Can someone with a trained ear potentially tell the difference between that and 24 bit at 192 kHz? In a studio environment it's possible. Most audiophile claims are dubious and a blind A/B test catches them out on most of it but the Nyquist-Shannon sampling theorem does not directly apply to quantized samples, it's about exact samples and with quantization, sampling rate is intertwined somewhat with the quantization depth.
If you want to hear the difference between an audio file recorded at 44.1 and 88.2kHZ, then you need slow the audio playback down. Otherwise, a trained ear cannot physically hear the difference.
44.1 is "enough" only in theory. This assumes a physically impossible steep filter. Realistically, frequencies around 20 kHz will create audible artifacts (aliasing). So yes, a trained ear can tell the diffrenece between 44.1 and even 48 kHz. Like many other commenters in this thread, you are mixing up math theory with physical limitations of AD/DA converters. Oversampling is a common way to address this limitation, but strictly speaking 44.1 kHz is not as obviously "enough" as it seems.
A treated room would be the most impactful, DACs the least.
The most impactful for noticing the difference? Again, I would argue it's the trained ear. If you have plenty of mixing experience then all these details add up, and a treated room becomes the most critical - agree with that.
The DAC is pretty impactful if it's outright incapable of outputting anything beyond the usual 48kHz :)
huh...
So I guess the programmer equivalent is distributing .pdb's (or, symbols)
Pretty good analogy. Thing is though, the person who receives the 16-bit, 44.1khz music file can always upsample it to 192khz and not lose anything in the process (heck, lots of audio stuff oversamples internally to this level or beyond, for extra aliasing headroom!). I'm not sure about expansion from 16bit to 24bit though, downward expansion isn't necessarily perfect.
You’d be adding 150khz and 8bits of nothing.
The whole audiophile industry is built on stuff which doesn't make any sense
My favourite: "audiophile-grade" audio players which allocate a single continuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented audio" causes audible "jitter".
Of course, they don't know that what looks like continuous memory to user-code is probably discontinuous in kernel/physical RAM.
Didn't check in many years, I wonder if they created kernel level players to account for that, to have "true continuous memory"
Don't forget: "most players use malloc to get memory while new is the c++ method and sounds better."[1]
[1] https://www.audioasylum.com/messages/pcaudio/119979/
> My favourite: "audiophile-grade" audio players which allocate a single contignuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented memory" causes audible "jitter".
Thanks for the laugh... this is absolutely bonkers. In case anyone is wondering, before sound hits our ears it has to go through a digital to analog conversion, which takes place on hardware independent of the CPU, operating with its own clock and buffers etc.
Am486DX/100 was enough to decode and listen an MP3 at 22KHz (and maybe mono?) and was more than enough to listen for 44/16/2 PCM. It's 31 y.o. today.
In addition to that, while it is possible to hit a delay and run out of buffer because memory access is slow (the most obvious would be if the input got swapped to disk at an inopportune moment), but the audible effect is really obvious. This isn't some subtle "oh my music sounds ineffably worse" effect, it's "my computer is glitching and my music is unlistenable."
I can tell when my CPU usage spikes because it causes a hum through my speakers, so this does not seem that far-fetched.
It's just means you have a shitty audio tract with not enough shielding. Move to SPDIF/TOSLINK.
The latter is probably true, but the former does actually happen, and it's easy to accidentally do--lossless or not.
If you try to use empiricism when it comes to certain groups audiophiles, you are going to be sorely reminded that it's basically the equivalent of healing crystals for a different type of person. 24/192 is useful for mixing/mastering, but completely unnecessary for the end product to distribute for listening.
24/192 is also great for digital synthesizers--if you're generating a waveform like a sawtooth that has theoretically instantaneous transitions, they can eat as much frequency as you can give them. Running at 44khz loses noticeable high-end content.
Most modern digital synths have already caught onto this and run internally at much higher sampling rates even if their output gets downsampled, but sometimes you run across a vintage plugin that runs at the host audio rate and working in a higher sampling rate is audible.
You can generate perfect band-limited sawtooth waves at 44.1khz, there are multiple techniques for doing this and most production digital synthesizers use them.
Oversampling gives you headroom for aliases for the rest of the synth that is more vulnerable to it.
Yeah, I was oversimplifying a blit, the raw waveforms are usually okay, but I distinctly remember old-school VSTs where you couldn't achieve a nice saw lead at 44.1.
It's tough to tell without specific names, but I imagine a lot of particularly old* VSTs were written to use naive sawtooths rather than perfect band-limited ones, which would have terrible aliasing at 44.1 khz. Oversampling those would help a lot!
* Some people are still making this mistake, despite information on the (many) ways to do it the right way being widely and freely available!
I wonder if there's also distortion or ring modulation stages where some of the energy above hearing range might spill into audible sidebands if they're not nyquist-limited first.
Yeah, that's the "rest of the synth" part that's more vulnerable to aliasing.
There's some ways to do band-limited distortion but...they aren't nearly as widespread, easy, or universal as band-limited oscillators.
Ring modulation is funny though because you'd ideally want the sidebands to modulate down by default rather than filter them out, that's why you're using it.
No synth generates sawtooths by literally drawing a saw tooth in PCM. The distorsion you get if you do that is not subtle at all.
32-bits are great for recording too because they do an incredible job of capturing the dynamic range without having to be precise on the preamp settings. It removes an entire job from the recording workflow.
192 for mixing and mastering can be useful especially if you're doing a lot of effects, especially anything that pitch shifts. But I've seen low quality phone-microphone recordings make it to the master; if you capture lightning in a bottle, it hardly matters what the settings were, what the microphone was, or anything else.
Even with mixing/mastering 96khz is enough for persisting to files. But as another commenter said, 192 is useful, if you bend and stretch samples!
They literally sell actual crystals that you’re supposed to place on top of speakers and amplifiers to make them sound better.
We had a really nice crystal decoration that I happened to put on top of one of my TV speakers and, wouldn't you know it, it had this resonant frequency somewhere around specific human speech frequencies that drove us absolutely bonkers until I figured out the cause and moved it.
(2012)
I wonder how many people think that 24 bit audio encodes 50% “more”
It is 50% more headroom above the noise floor in logarithmic decibels.
I completely accept that human audition has limits that are easy to determine by playing a pure sound. But is it the same with music, where multiple frequencies are played and interfere with each other? Aren't some harmonics or effects created by these "inaudible" frequencies?
To try to imagine something similar: the human eye is unable to see UV light, yet fluorescent paint has a visible quality of its own compared to "normal" pigments.
(2012)
Some previous discussions:
2023 https://news.ycombinator.com/item?id=34698427
2022 https://news.ycombinator.com/item?id=30138561
2019 https://news.ycombinator.com/item?id=19318898
2017 https://news.ycombinator.com/item?id=15127633
2015 https://news.ycombinator.com/item?id=10520639
2014 https://news.ycombinator.com/item?id=8689231
2012 https://news.ycombinator.com/item?id=3668310
Obligatory mention of https://xiph.org/video/ which clears up a lot of misconceptions.
24 bits is now ubiquitous and 32 bit is becoming the norm in recording studios.
32-bit float has become popular in filmmaking/field recording equipment lately because, with a microphone preamp that supports it, you can capture the entire dynamic range of the microphone--there's no accidental clipping if you drive the gain stage too hard.
It's a bit redundant for a skilled technician, they're already used to setting the gain staging, inbound compression, and feathering the mics to avoid this in 24-bit, but if you're handing a boom mic to a novice and have a scene where e.g. someone's whispering and another person's screaming, it can be nice to not have to worry about it.
That use case is literally addressed in the first sentence.
sheeesh , measly 24-bit/192kHz of course it makes no sense, unless it is downloaded through low oxyegen wire, which somehow and unfathomably, must have been omited or forgotten.
If it has been transmitted via hollow-core fibres it will obviously sound hollow.
For typical listening (though humans can perceive bone-conducted vibrations up to 100 kHz or even 120 kHz) 16-bit-fixed/44.1kHz is a high-fidelity transport format. As a DSP researcher, I prefer 32-bit-float/44.1kHz as a transport format. I often upsample to 32-bit-float/188.2kHz or even 32-bit-float/192kHz for signal processing applications such as high-fidelity reverberation via direct and FFT convolution. While the author advocates for the transport to ear use case, I would argue that 24-bit/192kHz provides greater fidelity and resolution for sound processing. I found the pedantic arrogance of the author to be annoying. But yes, the sampling theory is an important consideration -- but so is the quality of the actual digital filters used in the DAC->ADC pipeline. They are much more forgiving and less lossy at 192kHz.
[dead]
The more the bits the better the music, easy as one two three
Don't forget to buy the new low oxygen platinum plated HDMI cables for the better experience!
/s