A Pragmatic Approach To Jitter In Digital AudioOctober 27, 2011
A couple of weeks ago I went to an audiophile trade show (the Rocky Mountain Audiofest). It was interesting to note that a lot of the exhibitors and participants there were clueless about a great many things. This article is about one of those things: Jitter. After talking with many people there I have come to the conclusion that the overwhelming majority of audiophiles have no idea what they are talking about when it comes to jitter. Doing a google search for audiophile and jitter also bears that out by showing that no two audiophiles agree on what jitter sounds like! Jitter is difficult to measure, difficult to quantify, and is very technical. It should be no great surprise that this is fertile ground for misinformation, misunderstanding, and fraud. This article will try to put a dose of reality and perspective on the issue of jitter.
Jitter: Where it comes from and what matters
Jitter is simply a timing error of a digital clock. Not a clock like what you have on a wall. The real world analogy would be something like a metronome. A clock is a regular beat that is output from an oscillator. Jitter is when that regular beat comes a little early or a little late.
The only time that jitter matters is when an analog to digital or digital to analog converter is used (ADC or DAC). With any audio converter, there are three clocks that go into the converter: MCLK, LRCLK, and SCLK. LRCLK is the left/right clock, also called the sample clock, and is usually at something like 48 or 96 KHz. SCLK is the serial data clock, and is usually at 64 times LRCLK, or in the range of 3 to 7 MHz. MCLK is the master clock, and runs at 128 to 512 times LRCLK, or 11 to 25 MHz.
MCLK is the only thing that matters when it comes to jitter. Jitter on LRCLK and SCLK have absolutely no bearing on anything you might hear. The reason is simple: LRCLK and SCLK are re-synchronized to MCLK inside the ADC or DAC.
In addition to the clocks, DACs require digital audio data. Like LRCLK and SCLK, the data itself has no influence on jitter. The data is re-synchronized to MCLK inside the DAC. If, for some reason, the audio data is causing something unintended in the audio output then this is caused a data error, not jitter. Data errors are not subtle. They are not something that only the “golden ears” hear. A data error will cause everybody in the room to take notice and say, “what the hell was that”!
Audiophiles blame lots of things for causing jitter. Here are some things that absolutely do not cause jitter: Wi-Fi, optical drives, hard drives, SDRAM, Flash memory, Error-Correcting Codes, Ethernet, where the audio is stored in the memory, and the GOP. None of these things cause jitter. The GOP might make me antsy, but not jittery.
What does cause jitter then? Well, anything that could effect MCLK. MCLK is generated by either a quartz crystal or a PLL. Any reasonably designed system can keep jitter from PLL’s or crystals to below 2 ns (nanoseconds). By “reasonably designed”, I am talking about things like most professional or consumer audio equipment. With standard crystal oscillators it is actually hard to exceed 1 ns of jitter unless the designer was just being stupid. PLL’s are more difficult to design, but 2 ns is reasonable and less than 1 ns is common.
Other things that cause jitter are signal integrity issues (which, if the designer did his job right shouldn’t be an issue at all), and power supplies. Again, the power to a PLL or crystal oscillator should be filtered on the PCB and any designer worth anything has already done that. For the purpose of this article, we’re going to assume that the designer of the circuit did an average job and filtered the power supply and properly terminated all signals.
How much jitter is too much?
In audiophile circles, this question will cause much debate. But before we get into this too much, let’s go over our terminology. A femtosecond, or fs, is one millionth of a billionth of a second, or 0.000000000000001 of a second. 1000 fs equals one picosecond, or ps. 1000 ps equals one nanosecond, ns. A nanosecond is one billionth of a second, or 0.000000001 of a second. Got that?
Mark Porzilli, designer of The Memory Player, says that “jitter in the femtosecond range is audible”. Exactly what he was saying is a little unclear, but we are going to interpret it as saying “jitter of less than a picosecond is audible”. Technically, anything in the 1 to 999 fs is “in the femtosecond range”. Also, rounding up to 1 ps gives his claim the largest chance of being correct. In short, we’re giving him the benefit of a doubt.
For our analysis we will use 1 ps of jitter as our lower bounds. For our upper bounds we will use 2 ns of jitter. 2 ns because that is what is easily attainable without doing anything special in the design of the circuit. For those following along at home, 2 ns equals 2,000 ps.
What we will now do is look at what 1 ps and 2 ns of jitter “looks like” in the real world. Basically, taking a pragmatic view of it. We’ll do this by looking at jitter in a variety of ways.
Jitter: Error in amplitude
One way to look at jitter is by how it causes errors in audio amplitude. There is a paper from the AES (Acoustical Engineering Society) that asks the question: ”What is the smallest amount of jitter that will cause a measureable error in the audio”? Consider this: You have a sine wave feeding an ADC, and the ADC is spitting out the digital data. How much jitter can you have on MCLK before you can notice something different in the digital data? This is super easy to calculate.
We start by having a “full scale” sine wave. The higher the frequency the easier it will be to detect the jitter. For the moment, we will use 20 KHz. We’re looking for a 1 LSB error, where the Least Significant Bit of the digital audio is “wrong”. The formula is:
max_jitter = (asin(2/(2^n_bits)) * (1/f)) / (2*PI)
I put a lot of parenthesis in that formula to make it absolutely clear, if a little unreadable. max_jitter is in seconds, asin is the arc-sine function. n_bits is the number of bits of the ADC. f is the frequency of the sine wave in Hz. When you go through this for a 24 bit converter and a 20 KHz sine wave you get a max_jitter of just a little bit less than 1 ps.
So, based only on the math, it appears that 1 ps of jitter is audible! But is that “real”? For starters, remember that this is a 24-bit converter and that 1 LSB of noise is at about -144 dB. Can you hear that? No. Most systems that have 24 bit audio converters have worse than 20 bit audio performance, or about -120 dB. If we redo the calculations for 20 bit converters then max_jitter = about 15 ps. For a 16 bit converter, max_jitter = 243 ps.
What can we conclude from that? We can conclude that 1 ps of jitter, while technically measurable, is not audible in any practical sense. We can also conclude that max_jitter for a 16 bit converter is still a lot smaller than the 2 ns figure that we’re using for the upper end of our analysis. More analysis is needed!
Jitter: Variation in Distance
Another way to look at jitter is as a change in distance from the speaker to the listener. If that distance changes, because the listener is moving, then that is effectively the same as jitter. One audio sample takes N seconds to travel from the speaker to your head, and the next audio sample take M seconds. The difference (N-M) is easy to calculate. It is simply the speed of sound times the jitter.
If we have 1 ps of jitter, times the speed of sound (340.29 meters/second), we get 340 picometers (pm, a trillionth of a meter). Let’s put this into perspective. A single atom of gold is 135 pm in diameter. If you move your head by the thickness of 2.5 atoms of gold then you are introducing 1 ps of jitter!
Looking at it slightly differently, if your audio sample rate is 96 KHz, and you move 340 pm for each sample then get a speed of 0.033 mm/second. So, if you move your head back and forth at a speed of 1/30th of a millimeter per second then you are causing 1 ps worth of jitter.
At 2 ns of jitter, that distance is 0.68 um (micro-meters). While 2000 times larger than 340 pm, this is still super small. A red blood cell averages 7 um, so this distance is 1/10th of that. Your ear drum is 30 to 120 um thick. The wavelength of red light is about 0.65 um.
What this means is that if your sound system was absolutely perfect, your own body and it’s tiny movements would cause the equivalent of 1 ps to 2 ns worth of jitter (and probably a lot more). When you put your finger on your neck you can feel pulsing though your arteries and veins. There are cameras that can actually detect this and determine your pulse rate from a distance. That same force will cause your eardrum to move, ever so slightly, with each beat of your heart. That movement is much larger than 0.68 um, and way more than 340 pm. That’s why, in a quiet room, you can hear your own heart beat. The audio jitter effects caused by your heart is greater than what 2 ns of clock jitter will cause.
Jitter: The winds of change
We now know that 2 ns of jitter is the same as changing the distance by 0.68 um, and that our body will do that naturally. But what else can cause that? Wind, for one. Sound waves can be blown around by the wind, especially the high frequencies. This is commonly seen in outdoor concerts where the high frequencies can actually be “away” from the audience. If you’re sitting on the edges of the audience, at the back, you might hear this as the high frequencies disappearing and then reappearing.
We’re listening in our living room, not at an outdoor concert. So what is the minimum air speed that could cause the equivalent of 2 ns of jitter? For this experiment, let’s assume that the listener is 2 meters away of the speaker. Going through the calculations, we can see that it is an air-speed of 0.116 mm/second. This is a tiny fraction of the speed of air coming out of your HVAC system. And a tiny fraction of what you cause by breathing. Even just the convection currents in the room, caused by your warm body, will be more than 0.116 mm/second!
Of course, these numbers were for 2 ns of jitter. For 1 ps, divide the numbers by 2000.
I’ll admit I’m somewhat glossing over the details here. To really have an effect the air should be turbulent, with some parts moving much faster than 0.116 mm/second, and other parts moving slower or in the opposite direction. Reality is that this does happen, and 0.116 mm/second is so slow that just about anything will make air move faster. It’s just that the math gets really nasty and I didn’t want to go through that labor. The important point is that the air speed required to “sound like jitter” is not just achievable, you can’t prevent it!
Jitter: Taking an excursion
Your speaker doesn’t sit still. If it did, it wouldn’t be called a speaker. It moves. It moves much faster and farther than anything we’ve discussed so far. A typical 6 inch driver at moderate volume levels is moving about 2 mm (peak to peak). Even your tweeter is moving much more than 0.68 um.
Look at it this way: lets say that your one-way speaker is outputting a 1 Khz sine wave and a 20 KHz sine wave. At 1 Khz, that speaker cone is moving in and out. On top of that, is the 20 KHz sine. The movement of the cone at 1 KHz is causing a variation in distance to the listener for the 20 KHz sine.
In a nice listening room, your speaker is probably the biggest contributor of jitter!
Saying that “jitter in the femtosecond range is audible” is laughable at best. Jitter of 1 ps is barely measurable in the audio data. Jitter of 2 ns is easy to achieve but most equipment is better than that.
Before worrying about 2 ns of jitter, you need to remove the other sources of “jitter” in your listening room. To do this: Stop breathing, stop your heart, cool your body down to room temperature, seal the room from any outside air, turn off that hot Class-A amp, and disconnect your speakers. That’s all!
In short: anybody who claims to be able to hear 2 ns of jitter is either a mistaken, a liar, or a zombie. I’ll assume the best by assuming that they are a zombie!
Due to space, I intentionally didn’t talk about some interesting things. For example, I didn’t talk about the spectrum of the jitter. In my honest opinion, the spectrum isn’t very important given the magnitude of the other things that could effect the audio. I also didn’t cover the design of modern delta-sigma audio converters and how they minimize the effects of jitter, in some cases reducing the effects to 1/256th of what was talked about in this article. As this wasn’t a scientific article, some things just had to be left out.