X-curve compensation re-EQ

3ll3d00d · December 29, 2017

2 hours ago, SME said:

As a slight alternative to the approach you took, you may want to try doing a +/- 30 degree (listening window) spatial average of the anechoic measurements EQing that to be flat instead. It's hard for me to tell by looking at your plots how much the high frequency response shape varies with angle. With my SEOS-15, I see up to 1 dB of variation in the response shapes, up to 10 kHz or so. (The variation is a bit higher above 10 kHz.) Perhaps the SEOS-10 is better behaved in that respect.

I preferred the sound from flattening the listening window average response instead of the on-axis response. I also noticed that Harman appears to calibrate their active speakers for a flat listening window response instead of flat on-axis response.

One crucial detail I neglected to mention or take into account here is that I generally avoid evaluating in-room response without taking averages of measurements in multiple locations.

In the case of the 500-900 Hz response, if you took measurements across a +/- 30 degree listening window and weighted them for relative distance before averaging, you might see more slope there after all. I just took a look at my own crop of measurements and noticed that in areas of changing directivity, both on-axis and MLP measurements stay somewhat flatter than the listening window average measurements.

As for the bass in the 125-250 Hz range, your ears are definitely the final arbiter on any calibration decision. It's possible that the bloat you are hearing is caused by just one bad resonance. Such resonances can be difficult or impossible to see in a measurement at a single location because they are hard to distinguish from peaks and dips caused by interference. However, if you take measurements in multiple locations and spatially average them, problematic resonances are more likely to stand out. If you can suppress the bad resonance, you may be able to boost the output in that frequency range a lot more without causing bloat. If it is a room problem, then the screen wall may not fix the problem but it probably can't hurt.

Thanks for the comments, it's interesting to hear how other people approach this.

IIRC the correction was based on an average of 0-45 degrees (though looking at the data again, I might be misremembering that...). I'm still not entirely happy with the 100-400Hz range yet but haven't had the time (or inclination) to properly revisit it. My current setup is really quite time consuming to work through the various filters and somewhat prone to manual error so I've left it as good enough for a while now. I would like to work out a better strategy for that mid bass area at some point though.

FWIW I posted some more graphs in http://www.avsforum.com/forum/155-diy-speakers-subs/2188265-attempting-3way-seos10.html#post55396722 which show the directivity & power response (with 0 and 15 degree reference angles) of that quasi anechoic data with the correction filter applied. The main thing that stands out to me from this data is that a 1-1.5dB hump develops from 1.5-4.5kHz as you move from 0-30 degrees so perhaps the filter should give a bit more weight to taking that down.

Ricci · December 29, 2017

8 hours ago, SME said:

There is no formal standard for homes or home mastering rooms. However, a very good informal reference is a system using speakers that have a flat anechoic frequency response and consistent off-axis frequency response, and are placed at least a few feet away from walls. Research by Floyd Toole and Harman of speaker prefer by blinded music listeners suggests that speakers with this kind of speaker are preferred from among many others tested. The reference is not perfect and does not precisely define what bass response should look like after accounting for the influence of the room. However for mid and high frequencies, where evidence strongly suggests the speaker sound dominates perception, I believe this reference is consistent and reliable.

Cinemas do rely on formal standards, but unfortunately they are not based on measurement of perceptually relevant quantities. The informal reference I describe above is defined with respect to speaker anechoic response or what we might call in-room direct sound response. The X-curve standard on the other hand relies on measurement of continuous pink noise and does not adequately distinguish between sound produced by the speaker and sound reflected back by the room at later moments. As stated before, perception of mid and high frequencies depends primarily on the sound produced by the speaker, but the reflected sound field depends substantially on other variables such as speaker radiation pattern vs. frequency, room size, listener distance, and properties of the room boundaries. This means that the same speaker, calibrated to X-curve standards in different rooms, will be EQed differently in each room to satisfy the standard, which implies that the sound will also differ between those rooms, despite the use of the same speaker and the same "calibration". Furthermore the power-averaged pink noise response of a flat speaker when measured in a typical cinema room will have no resemblance at all to the X- curve target. Forcing the measured response to conform to the X-curve target typically requires attenuating both high and low frequencies, leading to a speaker sound that will tend to sound heavy in the mid-range when used to play typical music content.

As I stated above, a flat speaker measured in a cinema exhibits a power-averaged pink noise response that looks totally different from the X-curve. So not only is the calibration failing to compensate for perceptually relevant differences between rooms (which would be expected to only appreciably affect bass frequencies anyway), but the calibration is causing the sound in cinemas to deviate substantially from that of other audio systems. I would guess that X-curve calibrated cinemas also sound worlds apart from PA/music systems used in similarly large rooms. Compensation for this abnormal response caused by X-curve calibration is almost certainly baked into the mixes, even if not intentionally.

I agree with your general assessment of EQ here, and I agree that it's usually best to limit its use to broad adjustments. Despite knowing better, I made this mistake in the "v1" config I published for "Wonder Woman". The narrow-band positive-gain peaking filter near 2 kHz, intended to sharpen the knee at the transition point of the X-curve target shape definitely caused more harm than good. In hindsight, that filter would only have been beneficial if: the dub-stage used had been calibrated with a similarly sharp knee, and the mixers introduced a negative-gain peaking filter to counter-act the resulting resonance. My guess is that experienced system calibrators purposely avoid sharpening the knee there because of the resonance it causes.

It's interesting to hear that you avoid using EQ on your subs. As you know, I've taken kind of the opposite, using aggressive EQ and specialized software to smooth response at multiple measurement locations. I am able to obtain very smooth frequency response with at those locations, but I cannot make the response precisely flat at each of those locations. The absolute frequency response varies with location a lot more for mid bass than for deep bass frequencies, and the mid bass is quite a bit hotter closer to the center seat.

All this is well and good, but it lead to another question. How should I calibrate my response in a broad sense? Should I calibrate for a flat response at the seat I most often sit in? Should I calibrate so that the response at every seat is either flat or tilted up toward the low end? Or perhaps I should calibrate so that the spatially averaged response across all the seats on the sofa is flat?

I decided to try to answer the question by listening and adjusting a low shelf to adjust the balance of deep bass to mid bass. I purposely sought out passages of music that had significant deep bass content. I have some electronic music with leads whose fundamentals that play well down into the 30s Hz. What I found was quite remarkable. First of all, there was a fairly narrow range of values for which the shelf sounded good. Any more than +/- 0.5 dB or so from ideal, and the bass quality was obviously inferior. Second of all, the perceived quality of the bass was actually fairly consistent from seat-to-seat. It does seem a bit stronger in the center seat, but the balance is mostly the same. This is the part that is most fascinating because the variation in mid bass response between seats is quite a bit more than +/- 0.5 dB. Third, the best bass heard in any seat was with a flat spatially averaged bass response.

To emphasize just how unintuitive this results are, I'll note that the measured response in my center seat as the mid-bass maybe 3 dB higher than the deep bass. Yet, if I use the low shelf to add +1 dB more to the deep bass while sitting in the center seat,, the bass falls apart. The deep bass lead turns to mud, and the kick becomes boomy and loses most of its tactile punch.

I wish I could conclude from this experiment that "the correct way to calibrate the sub bass" is to make the spatially averaged response flat, but it's not that simple. The shape of the spatial average depends on which measurements I include in the average. If I included measurements at locations beyond the ends of my sofa, I would end up boosting the mid-bass even more to make the spatial average flat, and this would probably not sound as good. Furthermore, the bass definitely sounds inferior in many areas of the room outside of the seating area. Unlike the sound above 250 Hz or so which is subjectively uniform throughout the room (except in the front corners of the rooms at extreme angles to the horns), the bass definitely sounds different in different parts of the room.

All of this begs for more investigation because it would be very helpful to have a better understanding of bass perception, especially for those situations in which there is not enough time or skill to optimize bass response by ear.

Cinemas often sound very midrange forward and rolled off up top to me which is what would be expected. Some of them also sound very loud and harsh to me mostly in the upper midrange I'd guess as 1-4kHz. Just an observation.

Do we know that the mixes are usually created on large cinema type systems employing an X-curve? Thus baking in boosted HF and LF content.

The X-curve shape vs air absorption of HF might be worth pondering. I admittedly have not done much research on the X-curve in quite some time. Is the P/N calibration usually made at a few spots centered in the theater seating or many averaged positions, or in worst case at only a single position? The back wall seats of a large cinema could be >100ft from the speakers and a "good" center seat in the theater would probably be around 40-50ft for a typical commercial cinema. Those are significant distances which reduce the HF energy. Below 1000Hz there would be little effect, assuming typical indoor atmospheric conditions and a distance of 40-100ft , but at 5kHz and above it becomes a major factor. There could be a loss of 5dB near 15kHz quite easily. I'm assuming the mixing desk in a full size cinema screening room would also be a good 30-50ft back from the speakers. This may take care of the lions share of the top end response of the x-curve, depending on the room and where it is measured.

I wouldn't say I avoid using EQ on my subs at home, it's just that my response is what I would consider to be pretty good below 100Hz natively, both the main LP measurement and averaged across many seats. Relative to the raw measurements I've seen from most. A combo of luck and a bit of thoughtful planning + research through measurements. It's been a while since I looked at it but I think it was something like +/-6dB from 8-105Hz with no Eq and no smoothing. There aren't any big nulls over that range and only one high Q peak at about 45Hz (center room seating) and a bit of a shelf up near 70-100Hz. I was able to get it to +/2dB with 4 simple filters. The major issue I tried to address was the 45Hz peak. I had it EQ'd flatfor a while, but something always seemed off with kick drums and other sharp bass events centered near 40-50Hz. It took a lot of cut to get it back to flat and that range just seemed to lose any sense of power or clarity at the main seats and was MIA anywhere else in the room. That peak primarily comes from the right 2 subs from what I recall. I spent sometime messing with different amounts of cut to that area but it just never sounded right. I think that cutting it so much had other noticeable effects besides reducing the direct sound energy at a relatively small area in the room as measured by the mic capsule.

Are you saying that minor bass shelf adjustments of only 0.5dB wildly affect your perception of the bass sound quality on your system? Your ears must be better than mine my friend. I don't think I would notice unless someone told me it was done and then only very subtly if at all.

SME · December 30, 2017

On 12/29/2017 at 11:22 AM, Ricci said:

Cinemas often sound very midrange forward and rolled off up top to me which is what would be expected. Some of them also sound very loud and harsh to me mostly in the upper midrange I'd guess as 1-4kHz. Just an observation.

Do we know that the mixes are usually created on large cinema type systems employing an X-curve? Thus baking in boosted HF and LF content.

Yes, absolutely. Pretty much everything except low-budget indie films is mixed on a an X-curve-calibrated dub-stage. I believe adherence to the X-curve calibration standard is required for industry certification.

On 12/29/2017 at 11:22 AM, Ricci said:

The X-curve shape vs air absorption of HF might be worth pondering. I admittedly have not done much research on the X-curve in quite some time. Is the P/N calibration usually made at a few spots centered in the theater seating or many averaged positions, or in worst case at only a single position? The back wall seats of a large cinema could be >100ft from the speakers and a "good" center seat in the theater would probably be around 40-50ft for a typical commercial cinema. Those are significant distances which reduce the HF energy. Below 1000Hz there would be little effect, assuming typical indoor atmospheric conditions and a distance of 40-100ft , but at 5kHz and above it becomes a major factor. There could be a loss of 5dB near 15kHz quite easily. I'm assuming the mixing desk in a full size cinema screening room would also be a good 30-50ft back from the speakers. This may take care of the lions share of the top end response of the x-curve, depending on the room and where it is measured.

At a minimum, response is measured at the reference position in the middle and 2/3s back or at the mix position in a dub-stage. I believe measurements may sometimes also be taken at other locations a few feet away, but measurements too far in front or behind are probably avoided, not just because of distance effects but also because the positions may be vertically off-axis of the horns and/or crossovers.

Distance losses definitely do impact sound quite a lot in cinemas. The X-curve shape above 10 kHz is fairly similar to distance roll-off at a distance of around 10 meters, which is probably pretty typical for a mix position distance in a medium size dub stage. Below 10 kHz however, the distance roll-off has nowhere near as much slope as the X-curve does. FWIW, here is an link with some specs for various mix stages at Sony. It's interesting that in addition to dub stages, they can also do mixing in large-size cinema with seats. I have no idea how their capabilities compare to the rest of the industry.

As for distance roll-off, this is something I've looked into myself for a variety of purposes. The amount and frequency dependence varies with both temperature and relative humidity. It should be safe to ignore the temperature dependence being that the temperature will be maintained within a narrow range within every indoor environment. Humidity dependence has a greater effect, especially when the relative humidity gets very low, but the overall trends are still fairly similar. Here is a plot of distance loss vs. frequency at several different distances at room temperature and 40% relative humidity:

The roll-off occurs predominantly in the top octave. As humidity increases (not shown), the roll-off occurs more rapidly in the top octave and less rapidly in the lower octaves, and vice-versa. Things get a bit weird when relative humidity gets very low, which may affect sound in the winter in cold climates if the HVAC does not have humidification.

I have developed a high shelf filter using the curves above to simulate distance loss effect very accurately up into the ultrasonic frequencies. If you are curious an have EQ capability, I highly recommend giving the shelf a try with varying gains to hear the effect. As you can see in the curves above, the loss at 20 kHz is approximately -6 dB for every 10 meters of distance:

High Shelf Filter
  f0 = 13900.0
  Q = 0.4675
  gain = 1.167 * LOSS_AT_20KHZ

I think you'll be surprised by how subtle the change is until the distances get very high. It makes me wonder if our brains actually anticipate distance loss effects, analyze the spectral shape to estimate distance, and compensate our tonal balance perception to undo the effects of the distance shift on the spectrum. This would mean that our brains could use this information to perceive the distance of acoustic sources, both direct sound and reflections that are unfiltered. These possibilities are definitely intriguing and probably deserving of serious psychoacoustic research.

On 12/29/2017 at 11:22 AM, Ricci said:

I wouldn't say I avoid using EQ on my subs at home, it's just that my response is what I would consider to be pretty good below 100Hz natively, both the main LP measurement and averaged across many seats. Relative to the raw measurements I've seen from most. A combo of luck and a bit of thoughtful planning + research through measurements. It's been a while since I looked at it but I think it was something like +/-6dB from 8-105Hz with no Eq and no smoothing. There aren't any big nulls over that range and only one high Q peak at about 45Hz (center room seating) and a bit of a shelf up near 70-100Hz. I was able to get it to +/2dB with 4 simple filters. The major issue I tried to address was the 45Hz peak. I had it EQ'd flatfor a while, but something always seemed off with kick drums and other sharp bass events centered near 40-50Hz. It took a lot of cut to get it back to flat and that range just seemed to lose any sense of power or clarity at the main seats and was MIA anywhere else in the room. That peak primarily comes from the right 2 subs from what I recall. I spent sometime messing with different amounts of cut to that area but it just never sounded right. I think that cutting it so much had other noticeable effects besides reducing the direct sound energy at a relatively small area in the room as measured by the mic capsule.

Are you saying that minor bass shelf adjustments of only 0.5dB wildly affect your perception of the bass sound quality on your system? Your ears must be better than mine my friend. I don't think I would notice unless someone told me it was done and then only very subtly if at all.

Yes I am, and I don't think is really has to do with the quality of my ears. I'm fairly certainly that if you were sitting here while I performed the experiment that you would find the difference to be fairly obvious. The reason these changes are so audible are that (1) they impact a broad range of frequencies; and (2) the baseline response is very close to being ideally balanced.

This fact may seem counterintuitive. For a given level deviation (dB), broader, lower Q response features are *more* audible than narrower, higher Q response features. For your modal peak at 45 Hz, changes of +/- 0.5 dB are pretty insignificant, but +/- 0.5 dB changes to 25-50 Hz vs. 50-100 Hz are much more likely to be heard. The reason for this has to do with how the system that is described by the frequency response actually affects the signal that is playing through it. Most musical content is dynamic in nature. It is constantly changing in level and/or frequency vs. time. The more rapidly things change in time, the less frequency specific their action is. This means that very fast events also necessarily involve higher frequencies.

A kick drum is actually an impulsive sound in which most of the initial energy is spread out across the bass range well above the sub range. When you say a kick is "centered at 40-50 Hz", what's really going on is that there is an initial wide-band impulse followed by a slower ringing decay around 40-50 Hz that gives it most of its tone. In fact, the sharper the kick, the wider band the initial impulse must be, and the more important higher frequencies are to accurately rendering that event.

So far as frequency response features are concerned, the broader features affect a lot more of the sound than the narrower features. The more narrow the feature, the more specific and longer duration the signal has to be to substantially activate it. The opposite of an impulse is basically a sine wave, and if you play a sine wave for long enough, right on top of a high Q resonance, it will amplify the signal like crazy and continue to ring for a long time after the signal stops, but other signals won't appreciably activate it like a low Q resonance.

If you don't believe me and you have EQ capability, try an experiment. Enter an EQ peaking filter on the left and right channels: f0 = 500.0, Q = 10.0, gain = +3.0. Try listening with and without the filter and note how much of a different it makes. The difference probably will be subtle if you hear it at all with most content. Though a few sustained notes that focus on that frequency may get your attention. OK. Now try changing the Q to 1.0. The change should be very *obvious*. In fact, you can try dialing down the gain, to see just how little gain it takes for the filter to taint the sound. Even 0.5 dB will probably be audible most of the time, especially if your system response was quite clean to begin with.

The tricky thing about interpreting frequency response measurements is that higher Q features tend to catch the eye a lot more easily than low Q features. a 5 Hz wide 1 dB bump at 40-45 Hz looks like a big deal even though it mostly isn't. But a 1 dB hump covering 30-60 Hz may be barely visible enough though it has a far greater effect on the sound. A logical workaround to this problem is to use smoothing to reduce the emphasis of narrow features. This is the right idea, but unfortunately the standard magnitude-smoothing algorithms in REW and other programs have weird side-effects with regard to the time domain. (I can't explain it any better than this without doing some kind of math analysis that I'm not even certain how best to approach.)

A much better choice is to use a time-domain window. The window explicitly limits the analysis to the part of the impulse that is more relevant to rapidly changing signals. That's what you really want. Smoothing of the frequency response data appears as a direct consequence of limiting the measurement in time. A frequency-dependent window (FDW) is an even better choice, because it varies the window length vs. frequency to account for the logarithmic nature of our hearing of frequency / time effects. FDW is mathematically equivalent to a type of smoothing in the frequency domain in which both the magnitude and phase data are smoothed together instead of just the magnitude data.

The FDW as an analytical tool has one major weakness, and that is that it doesn't account for group delay. The shorter the FDW, meaning the coarser the frequency response smoothing, the more group delay causes problems. This is very unfortunate because group delay is very difficult to avoid, even on systems with very flat responses, if crossovers are involved. Because I use FDW as a fundamental tool for my calibration, and I'm actually using linear phase crossovers for the express purpose of combating the group delay problem. Even then, it doesn't solve the problem for crossovers to subs and other situations involving multiple subs where response often has at least some excess group delay.

Note that group delay problem I speak of here has little to do with how things sound. Human listeners are very insensitive to group delay, as long as the amount is not excessive and there isn't any pre-ringing involved. The problem is almost entirely to do with the analytical methods rather than perception. My long term plan is to eventually develop a phase-adaptive FDW algorithm, possibly something that would work in conjunction with an in-room speaker optimization algorithm. Though such functionality would probably be more generally useful for analyzing audio impulse response data for perceptually relevant information.

Sorry to regurgitate all this technical info at you and then leave you without a straightforward solution to follow. Unfortunately as I stated before, I'm tweaking my sub response by ear these days because I don't currently have a better way to do it. More than once, I've gone through the process of smoothing out the sub response at multiple seats, which is great for generating beautiful frequency response plots that I can upload and brag about on forums , but when I've listened to the result, it sounds like "meh". So then I just start messing around with the broad response shape while listening to music until it sounds better. With practice, I've gotten better at hearing what needs to change.

It's easy to overestimate the importance of the sub. Much of the "sub" sound and sensation lives at 100 Hz and above. Response all the way up to 200 Hz or so matters quite a lot for bass, and any response problems in the mids or above can also negatively interfere with bass perception. I found this to be much more true after I upgraded my subs. I suspect a lot of characteristic sound of a subwoofer comes from distortion (or lack thereof) that reaches beyond the sub frequencies.

At 160-250 Hz is where a lot of punch sound and sensation lives. In my recent work, I've come to think of that region as a kind of "hinge point" between the bass and mid range. If that range is deficient, the kick has no punch and the bass sounds weak and lacks in the tactile dimension. Adding more sub often just makes it more muddy and bloated. OTOH, too high response there causes substantial masking of higher frequency content, i.e. muddiness. The level for ideal response is pretty narrow, as with the rest of the response... something like +/- 0.5 dB, again. YMMV.

At the end of the road is excellent sounding bass and punch. I love music BDs with good kick. It kinda knocks the wind outta ya.

Kvalsvoll · January 12, 2018

Continuing from my last post, some general thoughts.

The problem with eq on playback is that there is access only to eq everything in one channel. This means that eq on the center to fix dialogue issues, such as cutting the hf, will also affect other sound effects, and that may not be desirable.

Some movies sound better. But it is the odd one with the strange sound that we notice. And it gets worse with louder playback level. It is quite clear that many movies are not suitable for 0dB master for a pleasant and natural sound - especially dialogue gets way too loud.

So, why not just turn it down?

Turning the master down destroys dynamics and impact. At -10dB you have lost 10dB dynamics, and the experience of low frequency sound effects are compromised, tactile experience across the whole frequency range is lost.

Tonal balance on dialogue is one thing. I believe distortion and noise caused by pushing the dialogue level too loud is even worse, and this is impossible to fix. You can hear this on many movies - voices are too loud, they sound hard and harsh, you can easily hear the noise when the dialogue is gated.

On a decent system dialogue is easily heard and intelligible at -30dB master, on any movie. On most movies the dialogue gets louder than natural at levels beyond -10dB. If the full dynamic range was utilized, the sound would be much more pleasant and at the same time would have much more impact and realism. When the overall level is reduced, the contrast will be larger, so that transients will be perceived as more powerful, and it is not necessary to clip everything. It would sound much better.

SME · January 13, 2018

4 hours ago, Kvalsvoll said:

Continuing from my last post, some general thoughts.
The problem with eq on playback is that there is access only to eq everything in one channel. This means that eq on the center to fix dialogue issues, such as cutting the hf, will also affect other sound effects, and that may not be desirable.

Or maybe it is desirable. How do you think dialog gets like that in the first place? Do you think the mixers just suck at mixing EQing dialog? No, of course not. Thousands of hours go into developing the soundtrack of a feature film. The majority of films are mixed very well, but home viewers are very rarely able to experience the fruits of their labor.

The problem is the system response that's being mixed to. This affects all aspects of the soundtrack: the dialog, the fx, the score and music, and the ambiance. All of this is being made to sound great on one or more dub-stages whose response deviates far from neutral and is completely different from that of non-cinema environments. The reason I mention talk about dialog is because that's where the flaws are most obvious. Conversation is what our ears are most attuned to hearing, and so it is what we are most likely to detect flaws in. That's also why I mainly rely on dialog listening to develop the re-EQ.

4 hours ago, Kvalsvoll said:

Some movies sound better. But it is the odd one with the strange sound that we notice. And it gets worse with louder playback level. It is quite clear that many movies are not suitable for 0dB master for a pleasant and natural sound - especially dialogue gets way too loud.

Playback at 0 dB is likely to be loud in any small room environment because of the quirks of the calibration method. If be calibrated using some kind of measure of direct sound or transient SPL, we'd probably use more similar nominal levels in small rooms as in large cinemas. A more typical small room playback level is -5 to -6, which is right around where I land after performing any needed re-EQ.

4 hours ago, Kvalsvoll said:

Turning the master down destroys dynamics and impact. At -10dB you have lost 10dB dynamics, and the experience of low frequency sound effects are compromised, tactile experience across the whole frequency range is lost.

Tonal balance on dialogue is one thing. I believe distortion and noise caused by pushing the dialogue level too loud is even worse, and this is impossible to fix. You can hear this on many movies - voices are too loud, they sound hard and harsh, you can easily hear the noise when the dialogue is gated.

On a decent system dialogue is easily heard and intelligible at -30dB master, on any movie. On most movies the dialogue gets louder than natural at levels beyond -10dB. If the full dynamic range was utilized, the sound would be much more pleasant and at the same time would have much more impact and realism. When the overall level is reduced, the contrast will be larger, so that transients will be perceived as more powerful, and it is not necessary to clip everything. It would sound much better.

For cinema tracks with more serious EQ balance problems, I also land closer to -10 dB, but it really varies with the movie. After re-EQ which I primarily apply below 500 Hz and above 2 kHz, I'm usually playing right around -6 dB (relative to calibration using 500-2kHz pink noise. With high quality re-EQed home mixes (i.e., most recent stuff done from Disney or done at Skywalker Sound Studios, presumably in their dedicated room), I sometimes go a dB or two higher than that because they sound so clean and my system sounds so good.

Realize that your complaints about the dialog being harsh and distorted are most likely also due to tonal balance issues. The mixers were not hearing that stuff. If they did, they would fix it or re-record the tracks. I'll admit that noise does sometimes enter into dialog recording, but it's usually very minor. The real problem is that the tonal balance reproduction at home is not consistent with what the mixers heard. Even where the dialog track is clipped or distorted, the sound is far more agreeable when the tonal balance is not skewed hot at 2 kHz and above.

Edited January 13, 2018 by SME
clarify the relevance of dialog sound vs. rest of the track

SME · January 13, 2018

While I'm here, I want to update to note that I still intend to work on and publish these. I'm just busy with other things at the moment. I have been making additional improvements to my own systems response, aiming for a true reference sound. (It never ends. ) These changes are relatively minor in the big swing of things and aren't likely to substantially alter the corrections I develop, but I do prefer to do this work with as close to a good reference as I possibly can. One of the issues is that I finding what I'd describe as "hidden resonances" ("hidden" because they don't appear in the FR, even when spatially averaged), many of which are in the 1-3kHz range. I know that range is critical and needs some of the most careful attention for getting a re-EQ "right".

With that said, I have been applying re-EQ to a variety of other movies as even a crude re-EQ makes a big improvement on net. I find it remarkable that around -2 dB/octave on the high frequencies seems to work very well for a wide variety of movies. It makes me wonder if this reveals something about how soundtracks are made.

I must confess that I am mostly ignorant as to the process that soundtracks undergo, but here is my hypothesis. If someone has more insight than I do on this, please feel free to correct me: Much of the preliminary work on movie sound effects is done in smaller "dub stages", and this is where most of the actual EQ happens. Presumably the different elements are laid down into stems or tracks that consist of sounds of a common type, i.e. dialog, score and music, ambiance, and fx. The key feature of these dub stages is that they use X-curve calibration, but because they are relatively small rooms, they rely on less slope than large rooms, say -2 dB/octave. (This is called out in the standard based on room volume.) After all this work, the final re-recording mix is done in a much larger dub-stage, one that approximates a full-size cinema instead of a mini-cinema. Even though it uses -3 dB/octave in the highs and probably has even more bass attenuation, the tracks still sound decent in here without additional corrections.

Perhaps what @FilmMixer was explaining via @Infrasonic is that no re-EQ is done in this final process. While I reckon it could be beneficial to EQ the stems a bit to help them meld together better when they are laid down into the full track, perhaps this is not done after all, and most of the balancing is done using level adjustments. With music mixing, EQ is very often applied to the individual tracks to help them play along better in the mix, but this may be less necessary in a cinema track in which a lot more dynamic range is retained in the final product. Again, this is just a hypothesis. There are other possible reasons why -2 dB/octave may work so well.

One other point of note. On my first publication of "Wonder Woman", I included a peaking filter near 2 kHz to sharpen the X-curve knee. I'm finding that not only does this filter not help but that I find a need to soften the knee further than the shelf filters do on their own. My guess is that calibrators purposely avoid creating a sharp knee at the transition because it causes strong resonance there with un-EQed content. Unfortunately, the standard says nothing about how sharp or soft the knee should be, and it is in and around 2 kHz where some of the most variation between X-curve calibrated systems can be anticipated. Likewise, while -2 dB/octave HF slope seems to work almost universally, the amount of attenuation at 2 kHz itself seems to vary a lot by soundtrack. This is unfortunate because 2 kHz is a critical frequency to get right. Too much leads to a very rough, hard, loud sound. Too little 2 kHz can make the sound overly soft, "polite", and whimpy. The ideal response here is within a very narrow range relative to the surroundings.

I'll be back here in time.

Kvalsvoll · January 13, 2018

3 hours ago, SME said:

Or maybe it is desirable. How do you think dialog gets like that in the first place? Do you think the mixers just suck at mixing EQing dialog? No, of course not. Thousands of hours go into developing the soundtrack of a feature film. The majority of films are mixed very well, but home viewers are very rarely able to experience the fruits of their labor.

The problem is the system response that's being mixed to. This affects all aspects of the soundtrack: the dialog, the fx, the score and music, and the ambiance. All of this is being made to sound great on one or more dub-stages whose response deviates far from neutral and is completely different from that of non-cinema environments. The reason I mention talk about dialog is because that's where the flaws are most obvious. Conversation is what our ears are most attuned to hearing, and so it is what we are most likely to detect flaws in. That's also why I mainly rely on dialog listening to develop the re-EQ.

Playback at 0 dB is likely to be loud in any small room environment because of the quirks of the calibration method. If be calibrated using some kind of measure of direct sound or transient SPL, we'd probably use more similar nominal levels in small rooms as in large cinemas. A more typical small room playback level is -5 to -6, which is right around where I land after performing any needed re-EQ.

For cinema tracks with more serious EQ balance problems, I also land closer to -10 dB, but it really varies with the movie. After re-EQ which I primarily apply below 500 Hz and above 2 kHz, I'm usually playing right around -6 dB (relative to calibration using 500-2kHz pink noise. With high quality re-EQed home mixes (i.e., most recent stuff done from Disney or done at Skywalker Sound Studios, presumably in their dedicated room), I sometimes go a dB or two higher than that because they sound so clean and my system sounds so good.

Realize that your complaints about the dialog being harsh and distorted are most likely also due to tonal balance issues. The mixers were not hearing that stuff. If they did, they would fix it or re-record the tracks. I'll admit that noise does sometimes enter into dialog recording, but it's usually very minor. The real problem is that the tonal balance reproduction at home is not consistent with what the mixers heard. Even where the dialog track is clipped or distorted, the sound is far more agreeable when the tonal balance is not skewed hot at 2 kHz and above.

The response of the x-curve calibrated system may actually be a lot closer to neutral than the simple frequency response measurement indicates. The tonal balance depends on the direct sound and the decay, so both loudspeaker radiation pattern and room acoustics matter. Since the x-curve was made by comparing a typical cinema speaker with hf horn to a much closer typical "hifi"-speaker, the x-curve correction is supposed to fix exactly that. Though several later studies have shown the flaws of the x-curve calibration, so something obviously got lost somewhere in the process.

The small-room-is-louder is another myth originated from making wrong assumptions on why the small room often sounds louder. Room size is not a property of loudness. If the decay is similar, the loudness will be the same. It is all about acoustics and speaker radiation pattern.

Distortion is not a tonal issue, but radiation pattern and decay can make distortion more audible. I have at least two movies where I have different sound tracks available, where one sounds bad and the other is much better. Try to compare the first Gravity 5.1 release to the later atmos - the atmos sounds much better.

And the mixers may very well be aware of issues with the sound, but for a number of reasons choose to not do anything about it. There was one movie with very bad dialogue, the noise gating was very obvious and caused the dialogue to sound distorted. I showed it to a professional sound engineer, what was his thoughts about this, could it be fixed. Yes, he could have fixed that, he could remove all the audible noise, to make the dialogue sound clean and nice. BUT: It would cost time and money. His suggestion as that there simply was not time available to fix it, since the plug-ins and method required to do it was no secret or unknown mystery to any sound engineer.

For most people - even the sound enthusiasts - watching a movie is like climbing a mountain to ski it ONCE. Perfect conditions would be nice, but since you ski it only once, you take what is there, and make the best of it. If it isn't that good, you don't climb it once more to see if it got better. And you certainly can live with parts of the run being in bad condition, if other parts are excellent. We watch the movie ONCE, and if the sound was excellent, it is a plus, but if there was one scene where the dialogue sounded distorted and noisy, it doesn't destroy the film.

mojave · January 15, 2018

AES paper 1886 is an "Evaluation of the SMPTE X-curve Based on a Survey of Re-recording Mixers."

The 24 mixers who work on a SMPTE calibrated dub stage, were asked if they ever compensate for the Xcurve in their mixes and if so, how. Five of the mixers stated that they did not use any compensation. Of the 19 mixers who do use some type of compensation, this usually takes the form of high frequency boost most often for music and dialog. Three mixers specifically stated the need to boost starting around 5k hertz. Only 1 mixer stated the need to use low frequency compensation, primarily below 150 hertz to aid the music track from sounding too thin.

Here is an article on the sound design goals of Wonder Woman:

http://postperspective.com/sound-wonder-womans-superpower/

Jenkins was fascinated by the idea of sound playing a physical role as well as a narrative one, and that direction informed all of Mather’s sound editorial choices for Wonder Woman. “I was amazed by Patty’s intent, from the very beginning, to veer away from very high-end sounds. She did not want to have those featured heavily in the film. She didn’t want too much top-end sonically,” says Mather, who handled sound editorial at his Soundbyte Studios in West London.

Regarding the sheep and goats, Mather says, “We pitched them and manipulated them slightly so that they didn’t sound quite so ordinary, like a natural history film. It was very much a case of keeping the soundtrack relatively sparse. We did not use crickets or cicadas — although there were lots there while they were filming, because we wanted to stay away the high-frequency sounds.”

mojave · January 15, 2018

Here is another article about Wonder Woman:

Sound Wonder Woman

Quote

Just because Wonder Woman isn’t in-your-face, like Dawn of Justice, doesn’t mean it lacks power. In fact, director Jenkins wanted a soundtrack with abundant low-frequency energy that would shake the room during action sequences. She called on supervising sound editor James Mather and composer Rupert Gregson-Williams to deliver material that could move the theater. “She didn’t want to hear high-end, harsh, shrill sounds, like metal clashes. There are no guns that sound too loud. The crowds are less punchy, with their screams and so on. And, the entire soundtrack is music-led,” notes Burdon. . . .

According to Burdon, when listening to music cues during the big action sequences, like the beach battle sequence in reel two, Jenkins consistently wanted to push the low-end of the music a great deal, to have a driving, rhythmic feeling with lots of energy happening in the low-frequency range. “Having the music lead, in some ways, simplified some of the action sequences. It wasn’t a typical big movie where you have directors, producers, creatives and the editor all pushing for music and multilayered effects and the articulation of everything. This was slightly easier because Patty’s starting point was that she didn’t want huge effects. So for the beach battle, it’s music-led. There was quite a lot of clarity for me, from music to crowd dialogue, in that battle sequence. Then we just embellished that with effects for the final,” Burdon explains. “One of the last things we did for the beach battle — after we watched it through, was to add more energy to the low-end. Patty felt like it needed more low-end.”

SME · January 15, 2018

On 1/12/2018 at 10:25 PM, Kvalsvoll said:

The response of the x-curve calibrated system may actually be a lot closer to neutral than the simple frequency response measurement indicates. The tonal balance depends on the direct sound and the decay, so both loudspeaker radiation pattern and room acoustics matter. Since the x-curve was made by comparing a typical cinema speaker with hf horn to a much closer typical "hifi"-speaker, the x-curve correction is supposed to fix exactly that. Though several later studies have shown the flaws of the x-curve calibration, so something obviously got lost somewhere in the process.

Here is a recent paper by Floyd Toole, inspired in part by very recent impulse response measurements of cinemas. It suggests target curves for neutral response in cinemas, which not coincidentally don't look too much different from the curves recommended for home systems:

http://www.aes.org/e-lib/browse.cfm?elib=17839

The main differences in the recommended cinema vs. home curves have to do with speaker directivity differences. The idea is that when direct sound is flat, the in-room FR curve shape should qualitatively track the speaker directivity curve, at least until one reaches the bass frequencies where boundary gain effects come into play.

I do have some concern regarding the analytical methods applied to those measurements, however. Recently, I discovered that magnitude-smoothing has unexpected quantitative side-effects. When viewing frequency response data derived from impulse response measurements, applying smoothing and/or windowing is necessary because the raw FR data looks like a scribbled mess without it. However, not all smoothing methods are equal. Most people apply magnitude-smoothing to the data without giving it a second thought, but doing so effectively omits some of the impulse response energy, particularly in the decaying tail. If power response smoothing (PSR) is used instead of magnitude response smoothing (MSR), then the energy in the decaying tail is included. Note that X-curve calibration is done using pink-noise signals and RTA measurements, which is comparable to PSR rather than MSR analysis.

I have observed this discrepancy in my own room. Using 1/48th octave smoothing, which most people would consider to be innocuous (and many even report as "un-smoothed"), I see a +1 dB difference between MSR and PSR in and around, e.g. 8 kHz. That discrepancy involves 20% of the overall IR energy in my case. This is despite the fact that: (1) my horns have a 90x60 degree pattern there; (2) my room RT60 is about 225 ms there; much of the first arrival of sound is absorbed by the sofa and a wide rear wall absorber; and (3) I sit close relative to my room volume. I have a hunch that the discrepancy will be quite a bit larger in most cinemas.

With that said, I don't think this analytical error will have much impact on what qualitative target curve shape will sound neutral. In many (probably most) cinemas, RT60 tends to be fairly similar from 500 Hz all the way up to 8 kHz, as evidenced in the recent SMPTE B-chain measurements study. The main impact of this error would be regarding how steep the MSR or PSR slope(s) should be where speaker directivity is actually changing, but the in-room curve should probably be flat where speaker directivity is not changing. Either way, there is absolutely zero scientific justification for the X-curve shape and calibration method and clear evidence that X-curve calibration substantially alters the direct sound response in cinemas compared to other types of venues.

On 1/12/2018 at 10:25 PM, Kvalsvoll said:

The small-room-is-louder is another myth originated from making wrong assumptions on why the small room often sounds louder. Room size is not a property of loudness. If the decay is similar, the loudness will be the same. It is all about acoustics and speaker radiation pattern.

I agree that "small-room-is-louder" is *mostly* a myth, in the sense that psychology or listener expectations play into it. However, empirically there are substantial differences in loudness between cinemas and home theaters, even when the latter are acoustically well treated. This is also not at all the same thing as the tendency of "typical home viewers" to choose a lower volume to avoid disturbing neighbors, overloading their equipment, or simply because they desire a more casual viewing experience. The loudness gap between cinemas and small room monitoring systems is commonly regarded to be around 6 dB. This appears to be true even after compensating for tonal balance inconsistency, at least if one is using a test signal band-limited to the 500-2kHz mid-range.

So what could account for these differences? It's complicated, as evidenced by my extended elaboration below.

I would start by assuming that the loudest sounds in movies are usually brief and transient rather than continuous. Even though dialog is a continuous sound, it contains substantial SPL peaks (easily +20 dB over average) unless it is heavily compressed as it often is for Internet or TV broadcast. Second, I would note that SPL calibrations method rely on SPL measurement of a continuous pink noise signal which includes *all reflected reflected sound energy*. The difference in SPL between the dialog peaks and the continuous pink noise signal could vary wildly between rooms, even though it is likely the SPL of the former that is most important for loudness perception.

It'd be nice to be able to talk about this in terms of specific quantities. It would certainly provide more confidence in this explanation. The tricky part is that we have to take into account that: (1) The reflected sound energy arrives at different times relative to the direct sound energy, and the time-of-arrival of reflected sound energy has substantial impact on how much different signals (dialog, pink noise, or other) are amplified. (2) Loudness perception does not relate directly to SPL but also relates to the duration of sound near that SPL.

With that said, we can note a few important trends: (1) The earlier the arrival of reflected sound energy, the more it will augment the SPL of shorter, transient sounds. (2) The earlier the arrival of reflected sound energy, the more likely the augmented SPL will be perceived as increased loudness. (3) Pink noise SPL measurement is completely time blind and emphasizes direct sound, early arriving sound, and late arriving sound energy equally.

We can summarize these trends further by noting that when calibrating using pink noise, late arriving sound will contribute much less to loudness than early arriving sound, even though both contribute to pink noise SPL equally. Furthermore under some circumstances, early arriving sound may actually boost loudness relative to its contribution to pink noise SPL by extending the duration that a high SPL transient is presented to the ear. (I"m not certain that this is an issue most of the time, but it might be.) The tricky part is figuring out precisely how to weight reflected sound energy loudness vs. time when attempting to accurately calculate the loudness of a calibration. I believe this weighting will be someone content dependent, and can only be resolved using subjective listening tests with real-world content. Furthermore, the tests would only be reliable if tonal balance differences were adequately corrected for as well.

===

Out of curiosity, I decided to do a kind of scaling or dimensional analysis of cinemas of different room sizes. (I ignore home theaters for what follows.) The results were remarkably insightful. First off, I chose a reference length scale, the room depth, and assumed a box-shaped room. The reference location was assumed to be 2/3s back. At the reference location, the screen width spans about +/- 22.5 degrees. (This is consistent with the design of most dub-stages I could find specs for.) The screen height is set by the screen aspect ratio. I presume a 2.31:1 cinema-scope ratio and that the width and height of the room are equal to the screen dimensions. This is not precisely true of course, but it is still a useful assumption to make when analyzing trends. With these assumptions in place, the other room dimensions essentially scale proportionally to the depth.

For the first analysis, I ignore discrete reflections completely and assume an exponentially decaying, diffuse reflected sound field. I also assumed a uniform absorption coefficient for the boundaries between rooms of different sizes. I'll spare everyone the math unless someone requests it. What I found was that under these assumptions, the level of the reverberant sound field relative to the direct sound is actually completely insensitive to room size. As the room size grows, reverb time increases, and as the reference distance increases, the speaker sound power output increases (by the distance squared) in order to maintain the same SPL at distance. However, the room volume also grows, providing more space for the sound to fill. These factors all cancel out.

Among parameters in this analysis that do matter with regard to relative reverberant field level are: speaker directivity and boundary absorption coefficient. Higher speaker directivity and higher boundary absorption coefficient will diminish the reverberant field level and lead to louder sound at the same calibrated SPL. (Reminder: this is ignoring the contributions of discrete reflections, which could alter the situation.)

Let's continue to ignore discrete reflections and ask what happens if we violate some of the other assumptions. First of all, the front wall in a real cinema will always have some "dead space" around the screen, and it's reasonable to assume that the percentage of dead space will be greater in smaller cinemas. This means that smaller cinemas will have somewhat larger volume and slightly higher reverb time than the above analysis suggests. The overall consequences, will be a lower reverberant field level and louder sound at the same calibrated SPL. Another ignored factor is room shape. A room with stadium seating has a substantial floor slope, which will reduce room volume (and reverb time) relative to listener distance, leading to softer sound at the same calibrated SPL. Last but not least, some cinemas now use screens with IMAX or 16:9 aspects ratios. These will increase the room volume (and reverb time) relative to listener distance, leading to lower pink noise level and greater loudness.

Now let's talk about discrete reflections in cinemas. The SMPTE study linked above reveals that a substantial amount of reflected sound energy arrives in the form of discrete reflections. In terms of scaling, I determined that the relative level of the discrete reflections is insensitive to room size. That's because the relative level of reflections depends on the total distance traveled by the reflected sound divided by the distance traveled by the direct sound. These will both tend to scale proportionally with the reference length. However, the time of arrival of discrete reflections does increase with the reference length. As noted above, the later the arrival of reflected sound energy, the less it's likely to contribute to perceived loudness, even though it will make an equal contribution to pink noise SPL.

So in summary, we can expect that smaller cinemas will sound louder than larger cinemas, when calibrated to the same SPL, because of a greater proportion of dead space around the screen (and the consequences on relative room volume) and (perhaps most importantly) because their discrete reflections arrive earlier. All other factors related to room-size are not directly relevant.

In the interest of bringing this discussion back around to differences between cinemas and home theaters, I'll note that home theaters vary a lot more than cinemas in terms of design parameters. The shape, the distance to the reference position, the overall room volume, and the acoustic properties of the boundaries are likely to vary considerably. However, a few characteristics are likely to be consistent.

First of all, a much higher proportion of overall sound energy is likely to arrive early enough contribute to loudness than in a larger room, regardless of how the room is treated. If the room is treated heavily to the point that it is dead, then the sound field will be essentially near-field, and nearly 100% of the energy will be direct sound and will contribute 100% to perceived loudness. The same is approximately true if the listener is sitting very close, even if the room is relatively untreated. If the room is completely untreated, the discrete reflections will arrive very early (relative to a large room) and the majority of that energy will also contribute to perceived loudness. The early reflected energy may even increase the loudness more than it increases the pink noise SPL, for psychoacoustic reasons noted above. However, in a reflective small room, at least some of the energy will arrive late enough to boost the pink noise SPL without contributing as much to loudness. Note also that a lower directivity cone-and-dome style speaker will increase reflected sound levels all across the board, accentuating these effects further. In other words, a reflective room could be either louder or softer at the same calibration level than a dead room. It depends a lot on specifics.

Either way, I suspect that it is very unlikely that one would encounter a home theater which does not exhibit greater loudness for content when using the same calibration level as a cinema. For this to occur, both of the following conditions would have to be met: (1) the discrete reflections would have to arrive late enough to not substantially increase loudness; (2) the energy contained in those reflections and any late arriving diffuse field would have to be strong enough to be comparable to a cinema room. The only way this could be achieved is if the room acoustics at the various reflection points were strategically designed to reflect or otherwise redirect away from the listening position as much sound-energy as possible AND if the listening position is sufficiently far away, relative to the room volume. This redirection could be done using reflectors at primary (and possibly secondary, tertiary, etc.) reflection points. Use of diffusion at those points might also work reasonably well, but diffusion would have to be avoided at other room locations because otherwise the sound would reflect from those points to the listener.

One other factor that is quite important for loudness is related to tonal balance reproduction accuracy. It is speaker / room-EQ quality. Any tonal balance problems including peaky resonances can substantially increase subjective loudness. Even if a speaker has good on-axis sound, or the response is EQed for a smooth response at the listening position, resonances can manifest in the off-axis sound and be made audible when they arrive as part of later reflections. The vast majority of speakers including probably most pro monitors have significant off-axis consistency problems and are either designed for smooth on-axis response or calibrated that way using room EQ. These will exhibit resonances in the off-axis sound that will be audible if the room is reflective. My recent experience indicates that off-axis resonances can be audible under some circumstances (noise and ambiance sounds) even if their sound does not arrive until several 10s of milliseconds later.

To return to the example of my room. I sit about 2.5-3 m away in a room with about 3650 cuft total volume. My RT60 times measure around 200-250 ms, but early decay is much more rapid. I think the tail decay is actually a bit longer (maybe 300-350 ms) because there is much more treatment in the vicinity of the listening area. The space is not carpeted save for a rug in the listening area. In terms of early reflections, my ceiling contributes a slight amount of energy (1-1.5 dB?) at around 1.5-3 kHz and below 700 Hz. My floor contributes below 700 Hz. The far side-wall reflections arrive at ~15 ms and are quite weak, somewhat diffuse, and just barely cross the "spaciousness" threshold, based on Toole's (IIRC) research into in effects of early reflections. So in all, early reflections are quite minimal for my system. My seat is also quite close, relative to room volume, when compared to cinemas. As noted above, I have about +1 dB of late arriving diffuse sound energy in the upper mid and high frequencies, relative to the direct sound. I find "-6" and sometimes a bit higher to be comfortable, if tonal balance issues are corrected. I wonder if I'll be listening closer to "-4" or "-5" most of the time, once I get my "hidden resonances" tamed?

I just wish I could get the raw IR data for the SMPTE study measurements because then I could assess more quantitatively just how much energy is late arriving. The plots of the IRs aren't any good because the X-axis resolution is not high enough to any meaningful conclusions to be drawn.

On 1/12/2018 at 10:25 PM, Kvalsvoll said:

Distortion is not a tonal issue, but radiation pattern and decay can make distortion more audible. I have at least two movies where I have different sound tracks available, where one sounds bad and the other is much better. Try to compare the first Gravity 5.1 release to the later atmos - the atmos sounds much better.

I wish I didn't hate the movie so much because I want to try the comparison.

I definitely remember the original 5.1 having a harsh soundtrack. I played it at -10, and many scenes and even the score were still cringe inducing at times. That was before I was doing any kind of re-EQ, and I was also running an EQ that was even more top heavy via Audyssey. I bet that track would clean up quite a lot with re-EQ, but I wouldn't be surprised if the Atmos re-release still sounded better. I doubt I can find a rental copy or else I'd try it out. Heck, I might have even liked the movie more if it had sounded better to me.

I tried to find out where the Atmos remix was done but had no luck. If the mix sounds substantially different from the 5.1, then it almost certainly was a home-remix in a dedicated room. That makes it much more likely that it had re-EQ done. I wonder if they had it done at Skywalker Sound? IMO, that room and their staff put out the best quality mixes these days. I think they sound perfect without re-E and serve as a good reference for how things should sound after re-EQ. Another thing is that if they did a home remix and applied re-EQ to attenuate the highs and lows, they likely freed up a lot more headroom, which could have reduced the amount of compression and clipping in the mix.

To your point about distortion version tonal issues, I agree they are distinct problems but they go hand-in-hand. Let me explain.

First of all, the high frequency part of certain sounds can sound like distortion when it is boosted too much relative to low frequencies. For example, a lot of content in voices in the 5-20 kHz range consists of breath and throat sounds. If this content is heard in isolation, it sounds a lot like noise, noise that may be modulated sound of the voice. If that content is too hot relative to the rest of the voice, it will likely sound distorted even though it's not distorted, except for the tonal balance.

Second, any distortion that actually is present in a recording is heavily affected by masking. If enough low frequency content is simultaneously present, the distortion may not be audible or may not be objectionable if it is heard. However, if the tonal balance is skewed to the high frequencies, the distortion will stand out more. A strong high frequency tilt in the reproduction will tend to make clipping sound much worse. And if the speaker has any significant high frequency resonances (most do), the ringing at those resonances will make it sound worse still.

Note that I didn't say anything about radiation pattern or decay. Those factors may affect the reproduction of sound a little bit, but they are probably pretty minor issues. The real issue with radiation pattern and decay is that they are nuisance variables in the calibration process. When employing the usual methods (e.g. MSR or PSR fit to a target curve), they cause systems to be calibrated inconsistently.

On 1/12/2018 at 10:25 PM, Kvalsvoll said:

And the mixers may very well be aware of issues with the sound, but for a number of reasons choose to not do anything about it. There was one movie with very bad dialogue, the noise gating was very obvious and caused the dialogue to sound distorted. I showed it to a professional sound engineer, what was his thoughts about this, could it be fixed. Yes, he could have fixed that, he could remove all the audible noise, to make the dialogue sound clean and nice. BUT: It would cost time and money. His suggestion as that there simply was not time available to fix it, since the plug-ins and method required to do it was no secret or unknown mystery to any sound engineer.

Sure. There will always be examples of less-than-stellar sound quality, in which mixers had to deal with "garbage" and make it work. I'm not talking about the exceptional cases here. Nor am I talking about the odd flaw here or there that works its way into every mix. This isn't about recovering the last 0.1% or something like that. These issues affect the entire soundtrack in almost every feature film that does not get a high quality home remix.

On 1/12/2018 at 10:25 PM, Kvalsvoll said:

For most people - even the sound enthusiasts - watching a movie is like climbing a mountain to ski it ONCE. Perfect conditions would be nice, but since you ski it only once, you take what is there, and make the best of it. If it isn't that good, you don't climb it once more to see if it got better. And you certainly can live with parts of the run being in bad condition, if other parts are excellent. We watch the movie ONCE, and if the sound was excellent, it is a plus, but if there was one scene where the dialogue sounded distorted and noisy, it doesn't destroy the film.

And you are right that most dialog on most cinema tracks is at least intelligible on good systems, but what about the 99% of systems that aren't so good? People complain all the time about "the music and fx overpowering the dialog". And for me and my "good" system, I still struggle with a lot of dialog. On paper, my hearing is very good, but my ears are anything but golden when it comes to discerning what people are saying. Something I've noticed is that I often don't notice when I fail to hear dialog. It's like that with other sounds too. If it's unintelligible, I simply fail to become conscious of it. Even if I am conscious of it, I'm having to exert a lot more effort to comprehend it. The experience is not as natural. It's similar to how having to read subtitles diverts ones attention from what's happening on screen. (Though I still usually watch foreign films subbed because the dubbed voice actors are usually so terrible.)

As far as dialog is concerned, these issues are fundamental to the art.

SME · January 16, 2018

9 hours ago, mojave said:

AES paper 1886 is an "Evaluation of the SMPTE X-curve Based on a Survey of Re-recording Mixers."

Quote

The 24 mixers who work on a SMPTE calibrated dub stage, were asked if they ever compensate for the Xcurve in their mixes and if so, how. Five of the mixers stated that they did not use any compensation. Of the 19 mixers who do use some type of compensation, this usually takes the form of high frequency boost most often for music and dialog. Three mixers specifically stated the need to boost starting around 5k hertz. Only 1 mixer stated the need to use low frequency compensation, primarily below 150 hertz to aid the music track from sounding too thin.

I wish I could download and read that paper in full. Maybe I should join AES.

I also wish I had a better understanding of the whole process of soundtrack creation. As I understand it, separate tracks (or "stems) are created for dialog, foley, fx, music, score, ambiance before they are assembled into a final mix, usually on a cinema-size dub stage. My understanding is also that practically recorded dialog gets subject to EQ at some point, in order to correct for mic pattern effects, distance / boundary gain effects, reduce unwanted noise de-ess, and remove plosives. I presume the score usually get EQed as well, as part of its production. Most FX probably get EQ during the design process.

A big question is, what kinds of systems and rooms are used for all this earlier production work? Is this work done in small rooms on near-field studio monitors? Or is it done on smaller dub-stages? I noticed that the specs for the rooms at Sony's post production studios indicate that they have several small dub-stages, named numerically, in addition to larger cinema size dub stages that are given proper names like "Cary-Grant Theater". These smaller dub-stages are probably calibrated using X-curve but using a reduced slope. When this work is done in smaller rooms, how are those speakers calibrated? Or are they used as-is?

The above questions are very important because all the content almost certainly has already undergone a lot of EQ before the re-recording mixer takes over on the big stage for the pre-mix and final mix stages of production. What kind of EQ is done depends on how that content sounded within those spaces. If the stems are prepared on smaller dub-stages, than a lot of X-curve compensating EQ may have already been applied. Even if done in small rooms, if the monitors are calibrated for flat in-room response instead of a target that looks like the Harman curve, then the low frequency content on the track will probably already be a lot hotter than it is on typical music masters by the time it reaches the dub stage.

Anyway, I think it's interesting that 5 kHz was mentioned as a high frequency boost point. First of all, a deficiency in the 2-4k range is probably less likely to be compensated for than at most other frequencies because our ears are quite sensitive to that range and we don't miss those frequencies quite as much. It's also interesting in the context of the "Wonder Woman" track in particular because the trend in the PvA data suggested more boost was applied to that track below 5 kHz. I compensated my re-EQ accordingly, and preferred the new result.

I also think the part in bold is interesting. I believe music tracks are mastered with the implied expectation that a floor boundary gain of +4-6 dB will be introduced somewhere roughly around 150 Hz. It's rough because there is no standard, and the floor boundary gain experienced by a particular anechoic flat speaker will depend on the driver arrangement. However, because there is no standard, music is purposely mastered to sound consistent with preceding music. The gain at 150 Hz seems to work very well, and in fact, I came up with that number by studying the Revel Salon 2, one of the Harman brand's flagship speakers. Furthermore, because music tracks are likely already mastered for playback on music systems, they probably don't see any additional EQ until they are mixed on the big dub-stage. It's not surprising then that they would need some boost to the highs and benefit from boost to the lows below 150 Hz.

Anyway, I'd like to make a couple additional points as far as boosting the bass (or not doing so) is concerned. It's possible that little to no bass boost is applied to cinema sound tracks at any point in the process because the recording already sounds good without it. Instead it may be music tracks that consistently have their bass cut to compensate for the floor boundary gain effect mentioned above. This would not surprise me in the least. However, that doesn't change the fact that a cinema track is likely to have too much low frequency emphasis when played back on a system optimized for music, i.e. using anechoic flat speakers placed near a floor or speakers EQed to the Harman target.

The other point is that when it comes to dialog, too much bass is almost always more offensive than too little bass. Thin dialog may not sound as "warm" or flatter the speaker as much, but it is still completely intelligible. Overly full dialog, however, suffers greatly in terms of intelligibility. If dialog is prepared to sound "perfect" in either a small dub-stage or a small-room that is calibrated for flat in-room response, then the slight thinness that might present on the final mix stage may not warrant any additional low frequency boost.

With that said, floor boundary gain is only part of the reason for an anechoic flat speaker to exhibit a rise in the bass. The other reason is mostly to do with directivity effects. The biggest directivity change is at the baffle step frequency, where the directivity of a speaker drops off considerably. In order to maintain a flat *anechoic* response, the voltage to the speaker must usually be boosted +6 dB. Note that +3 dB of the boost is needed for the efficiency loss due to decreased acoustic impedance. The other +3 dB accounts for the widening radiation pattern. Thus, the baffle step loss involves an increase of +3 dB in *power* in order to maintain the same direct sound SPL. This increase in power increases the amount of reflected sound energy and boost the in-room response below that frequency, consistent with the more gradual directivity loss at higher frequencies. The floor boundary gain actually has the opposite effect of the baffle step loss on the speaker, and it both increases efficiency +3 dB and reduces radiation from 4pi to 2pi space again. However, unlike the baffle step effect, anechoic flat speakers don't compensate for the floor boundary gain effect at all. The acoustic impedance boost from the floor boundary increases power response by another +3 dB, and the radiation pattern narrows from 4pi to 2pi again, which boosts direct sound by a total of +6 dB. The net effect of both of these changes is a +6 dB boost in direct sound SPL and a +6 dB boost in sound power. Thus, the measured *in-room response* will (on average) exhibit a +6 dB gain through the BSC and floor boundary gain regions, and this gain will occur in addition to the gain that occurs gradually, when descending from frequencies above, as directivity gradually decreases. Calibrating a system for flat in-room response suppresses all of those gains, leading to much less bass in a wide variety of rooms.

An interesting hypothetical question is whether the cinema people are "doing it right" and the music people are "doing it wrong". If unaltered recordings have more natural bass levels without the extra +6 dB bass boost, then perhaps music mastering practices should be changed to not assume such a boost is present. I think that's a reasonable case to make. However, I'd argue in favor for sticking to how it's done in music, and perhaps even codifying 150 Hz as the center frequency for the boundary gain effect, for 2 reasons: (1) cinema audio production already relies on standards, standards which can be changed and will still be adhered to; music is and probably always will be The Wild West; (2) assuming the presence of that extra +6 dB bass boost in the playback system effectively increases the headroom available in the track by a lot! I'm sure this would help immensely with creating high impact cinema tracks that don't clip all over the place. Of course it means many cinemas would find themselves under-equipped to accurately reproduce soundtracks. I don't think this is that big of a problem though. I usually find the bass quite thin at most cinemas anyway, and even if it's thin, the dialog still works. Targeting a system with +6 dB extra bass will simply allow the cinemas with the better speakers and subs to distinguish themselves accordingly.

9 hours ago, mojave said:

Here is an article on the sound design goals of Wonder Woman:

http://postperspective.com/sound-wonder-womans-superpower/

9 hours ago, mojave said:

Here is another article about Wonder Woman:

Sound Wonder Woman

Big thanks for posting these articles! I love to learn what the director and sound people were thinking when they made these tracks. It's worth mentioning that just because the track does not emphasize high frequency sounds does not mean that it has less high frequency contenti. Practically every sound has some treble component. Any sound that changes, including the human voice as it is modulated, contains treble frequencies. Likewise, they probably didn't shelve down the highs globally so they could have a track that "emphasizes low frequency sounds more", or if they did, it was applied with a light touch. Too much treble suppression just sounds bad no matter what aesthetic you're aiming for. I don't doubt that the sub bass was boosted some, but it usually is for most movies, and it can usually be done without harming the fidelity if applied judiciously and skillfully. Most of this aesthetic was probably achieved by sound design choices. Like they said, "She didn’t want to hear high-end, harsh, shrill sounds, like metal clashes. There are no guns that sound too loud. The crowds are less punchy, with their screams and so on."

Hopefully I'll get time to revisit this master soon. Now if I could just find a way to get the director to visit me and hear it re-mastered on my system with "physical bass" that extends to 5 Hz.

Edit: This is some great info for me to keep in mind when finishing my re-master, from the second article:

Quote

In keeping with the rich low-end directive, Burdon was conscious of not thinning out Wonder Woman’s dialogue with EQ (which can be a useful tactic for helping dialogue cut through a mix). “Patty was keen to make sure that Gal Gadot sounded strong and had confidence and strength. My style has always been to try and keep the low-end in the dialogue and that is exactly what Patty wanted for Wonder Woman. She wanted Gadot’s voice to sound strong and full,” says Burdon. Doing so wasn’t too challenging for this mix. He notes that, “the dialogue played quite solid from the get-go because there wasn’t so much competing in that frequency range.”

I would say that the challenging thing about low frequencies in dialog is the relative lack of consistency in different rooms and systems. If one is leaning toward a very full sound when creating a track, there is the risk that it will become unintelligible on some playback systems.

As per my previous notes on the movie, I had a tough time getting the bass level "right", between keeping the dialog clean and retaining the bass impact. It was very apparent that they intended for the dialog to sound quite full, and that fullness was retained (if not still slightly excessive) when I played the movie with my last (unpublished) re-EQ iteration. Unfortunately however, bass levels are not likely to be reproduced consistently between systems, whether calibrated or not. Most of the time, this isn't a big deal because mixers do usually thin out any bass that's likely to cause intelligibility problems. This mix is a clear exception, one in which full dialog was 100% intended by the director.

If anything, this example reveals that there's only so much that can be achieved by a re-master like I'm doing here. I don't have the director here to tell me what works best for her vision. I can certainly try to understand that vision, based on the mix itself, and I hope I've achieved as much here. However, for best results, a proper re-mix should be done using a dedicated small room with state-of-the-art equipment calibrated to a standard or target appropriate for home systems (i.e. in-room response more like Harman target than Audyssey target).

Err, in my opinion, of course.

Edited January 16, 2018 by SME

Kvalsvoll · January 18, 2018

On 1/16/2018 at 12:55 AM, SME said:

...

Second, any distortion that actually is present in a recording is heavily affected by masking. If enough low frequency content is simultaneously present, the distortion may not be audible or may not be objectionable if it is heard.

...

And you are right that most dialog on most cinema tracks is at least intelligible on good systems, but what about the 99% of systems that aren't so good? People complain all the time about "the music and fx overpowering the dialog".

...

Too much information, but i appreciate your dedication to elaborate and explain your thoughts, and I did read it all.

I have selected a small subset from your post to comment on, realizing we can not cover all aspects of sound reproduction in a few posts in this thread.

Distortion and masking:

Distortion components at higher frequencies are not masked by low frequency content, unless this content also has higher frequency harmonics sufficiently loud in level. Masking occurs around the fundamental tone, and when the difference in frequency is large enough there will be no masking at all. For low harmonics the masking is high, so that for 2. h the detection level is around 2%. For higher harmonics, or any other content at much higher frequency, the detection level approaches the audibility limit for a tone at that higher frequency, as if the low freq tone was not present at all. I know this because i have preformed a controlled experiment just for the purpose of investigating detection level and masking for harmonic distortion.

Dialogue:

Yes, some dialogue can be more difficult to understand, and whether you experience intelligibility to be sufficient or lacking will always be a subjective evaluation. And some systems will be better and some worse. The typical home system with a center speaker close to the floor and a table between this center and the listening position is not a good starting point. In such a system, it is likely that the frequency response is very compromised through the midrange, and there will be severe early reflections. This causes poor intelligibility. A simple measurement will reveal that freq resp is far from flat, and early decay is poor. On top of that, I suspect that many center speakers have problems with both on-axis frequency response and poor off-axis linearity. In many cases room acoustics is not suitable for sound reproduction because the room is not sufficiently damped, causing problems with overall decay and early reflection level. This is the reason for all those "can't hear the dialogue" comments. The only solution to this is to improve the sound system and room acoustics.

SME · January 18, 2018

14 hours ago, Kvalsvoll said:

Too much information, but i appreciate your dedication to elaborate and explain your thoughts, and I did read it all.

I have selected a small subset from your post to comment on, realizing we can not cover all aspects of sound reproduction in a few posts in this thread.

Distortion and masking:

Distortion components at higher frequencies are not masked by low frequency content, unless this content also has higher frequency harmonics sufficiently loud in level. Masking occurs around the fundamental tone, and when the difference in frequency is large enough there will be no masking at all. For low harmonics the masking is high, so that for 2. h the detection level is around 2%. For higher harmonics, or any other content at much higher frequency, the detection level approaches the audibility limit for a tone at that higher frequency, as if the low freq tone was not present at all. I know this because i have preformed a controlled experiment just for the purpose of investigating detection level and masking for harmonic distortion.

While your own experiments with masking in the context of harmonic distortion are interesting, tests involving threshold perception of a harmonically related continuous pure tone in the presence of another pure tone are nowhere near sufficient to characterize masking phenomena in general. Masking is dependent on frequency, level, duration, and bandwidth of the signals. Being very non-linear, it's also not possible to predict the masking effects of 3 or more signals from experiments involving only 2.

When I said "low frequencies", I meant "lower frequencies". In the case of content mixed under X-curve conditions and then played on a flat system, it is primarily the strong tilt in the upper mid and high frequencies that may allow distortion to become audible or allow mildly audible distortion to become objectionable. However, I have also had the seemingly paradoxical experience of hearing a reduction of brightness upon attenuating low frequencies. I believe in such cases, the over-abundance of low frequencies was suppressing mid frequencies that would have otherwise suppressed high frequencies. Optimizing EQ balance by ear, whether in the process mastering or voicing speakers, is a very challenging process. Fortunately in the case of voicing speakers for optimal music playback, we have scientifically validated criteria. Unfortunately, the cinema industry relies on its own standard for system calibration that lacks scientific validation and is inconsistent with the criteria for optimal music playback.

15 hours ago, Kvalsvoll said:

Yes, some dialogue can be more difficult to understand, and whether you experience intelligibility to be sufficient or lacking will always be a subjective evaluation. And some systems will be better and some worse. The typical home system with a center speaker close to the floor and a table between this center and the listening position is not a good starting point. In such a system, it is likely that the frequency response is very compromised through the midrange, and there will be severe early reflections. This causes poor intelligibility. A simple measurement will reveal that freq resp is far from flat, and early decay is poor. On top of that, I suspect that many center speakers have problems with both on-axis frequency response and poor off-axis linearity. In many cases room acoustics is not suitable for sound reproduction because the room is not sufficiently damped, causing problems with overall decay and early reflection level. This is the reason for all those "can't hear the dialogue" comments. The only solution to this is to improve the sound system and room acoustics.

Wait a sec. You're telling me that people struggle to understand dialog in movies because of strong early reflections, poor decay time, and non flat frequency response, yet in reality, people have zero difficulty having a live conversations in these same rooms. This is despite the "deficiencies" that you identify here concerning early reflections, decay, and "poor" in-room frequency response in the mid-range, and non-flat frequency response in general, which affect live voice just the same as voice reproduced through a speaker.

Logically, the acoustics are only a minor factor. The remaining issue(s) to consider are the speaker and the soundtrack. Yes, most speakers are not very good, and this is a real problem. On-axis neutrality and off-axis consistency are crucial. However the *average* response of a wide variety of speakers is approximately flat, when measured anechoically. However, whereas music tracks are produced mostly on systems that are very close to anechoic flat, cinema tracks are produced mostly for cinema systems that have very skewed responses, due to the X-curve calibration process.

So there are two major problems regarding dialog intelligibility: non-neutral speakers and non-neutral soundtracks. If either of these deficiencies improves, the overall result improves, and that may be enough to make dialog intelligible at least. For high quality reproduction, of course, it's essential that the monitoring and playback speaker responses (anechoic, not in-room) are consistent.

Kvalsvoll · January 19, 2018

4 hours ago, SME said:

...

So there are two major problems regarding dialog intelligibility: non-neutral speakers and non-neutral soundtracks.

...

This is not true. Acoustics determines the ratio between early direct sound and later reflected sound. The ratio between early direct sound and early reflected energy defines clarity. This is the case both for reproduction and when speaking in a room, and there are established standards for this.

Those standards makes is possible to predict intelligibility in class rooms and auditoriums, and adjust acoustics according to intended use to make the rooms perform well.

In REW there is now a Clarity graph, which shows the performance of the measured system in regards of those parameters:

"Clarity C50

The early to late energy ratio in dB, using sound energy in the first 50 ms as the 'early' part. C50 is most often used as an indicator of speech clarity."

The soundtrack is of course a crucial part here, but acoustics determine how well this soundtrack is reproduced, and with several sounds going on simultaneously it will be more difficult to discern the different parts of the sound when there are more and louder late energy because this late energy will then mask parts of the transient sounds in the early arrival sound.

SME · January 19, 2018

4 hours ago, Kvalsvoll said:

This is not true. Acoustics determines the ratio between early direct sound and later reflected sound. The ratio between early direct sound and early reflected energy defines clarity. This is the case both for reproduction and when speaking in a room, and there are established standards for this.

Those standards makes is possible to predict intelligibility in class rooms and auditoriums, and adjust acoustics according to intended use to make the rooms perform well.

In REW there is now a Clarity graph, which shows the performance of the measured system in regards of those parameters:

"Clarity C50

The early to late energy ratio in dB, using sound energy in the first 50 ms as the 'early' part. C50 is most often used as an indicator of speech clarity."

The C50 is not a ratio between direct and early reflected energy. It's really a ratio between direct + early reflected and late reflected energy. Beyond 50 ms is generally considered to be late reflected sound. The C50 is rarely important in small rooms because most of the reflected sound arrives quite early. In large rooms in which reverb times can get quite high without treatment, the C50 is much more important. And if you try having a live conversation in a large live room with enough distance between you such that C50 is poor, intelligibility will noticeably suffer.

3ll3d00d · January 19, 2018

On 15/01/2018 at 11:55 PM, SME said:

And for me and my "good" system, I still struggle with a lot of dialog. On paper, my hearing is very good, but my ears are anything but golden when it comes to discerning what people are saying. Something I've noticed is that I often don't notice when I fail to hear dialog. It's like that with other sounds too. If it's unintelligible, I simply fail to become conscious of it. Even if I am conscious of it, I'm having to exert a lot more effort to comprehend it. The experience is not as natural. It's similar to how having to read subtitles diverts ones attention from what's happening on screen. (Though I still usually watch foreign films subbed because the dubbed voice actors are usually so terrible.)

Do you find this in other systems or just your own system?

mojave · January 19, 2018

Kvalsvoll is right and the Clarity score is only based on reflections as defined by ISO 3382. A Clarity score is based off the room impulse response (RIR). On the RIR the direct sound is set to be relative time zero so it is not factored in the equation. The Clarity score takes the RIR and divides it at a certain time point. The energy before the divider (early reflections) is summed and the energy after the divider (late reflections) is summed. It then does an early to late energy ratio expressed in dB. I use SMAART and get a C10, C35, C50, and C80 score for each octave and 1/3rd octave band and an overall score. C50 is used to score a performance space for speech and C80 is used to score a performance space for orchestral music. However, the lower Clarity figures are helpful for small room acoustics.

Gerald Marshall provided a "Poor, Fair, Good" chart in his 1996 AES article called An Analysis Procedure for Room Acoustics and Sound Amplification Systems Based on the Early-to-Late Sound Energy Ratio. He used a weighted average octave band score with 15% for 500 Hz, 25% for 1kHz, 35% for 2kHz, and 25% for 4Khz. My preference for speakers are those that avoid a crossover in any of these bands. My JTR 215RT's have a crossover at 350 Hz and 6500 kHz.

SME · January 20, 2018

On 1/15/2018 at 4:55 PM, SME said:

And for me and my "good" system, I still struggle with a lot of dialog. On paper, my hearing is very good, but my ears are anything but golden when it comes to discerning what people are saying. Something I've noticed is that I often don't notice when I fail to hear dialog. It's like that with other sounds too. If it's unintelligible, I simply fail to become conscious of it. Even if I am conscious of it, I'm having to exert a lot more effort to comprehend it. The experience is not as natural. It's similar to how having to read subtitles diverts ones attention from what's happening on screen.

15 hours ago, 3ll3d00d said:

Do you find this in other systems or just your own system?

Oh yes, definitely.

Up until very recent, the vast majority of music I liked was music without lyrics. Since I can remember, I could never really hear the lyrics in most songs, so I never really got into them. Only in the last year or so, since I've gotten my own system to the level of performance that it is at now, have I been able to actually understand the words in most songs. It's been quite a breakthrough for me, actually.

And yes, dialog is often a problem for me on all kinds of systems including at cinemas.I might still hear 90-95% of stuff, but that extra 5% can be the difference between catching or missing a key plot point. Sometimes I have trouble understanding live spoken words. It's always a lot worse when accents get thick, especially if they are unfamiliar to me. I remember when I first moved to California, and I struggled to understand all the Asian accents. Then I moved to the east coast where I had a Greek supervisor, and I could barely understand a word he said until I'd been around him for a few months.

As I said, my hearing has tested as being nearly ideal, and in fact, I tend to be more of an auditory-oriented person more than a visual-oriented person. I suspect that my brain just devoted a lot more gray matter to processing other kinds of sounds and just doesn't deal well with any unnatural reproduction of dialog. Or something.

SME · January 20, 2018

9 hours ago, mojave said:

Kvalsvoll is right and the Clarity score is only based on reflections as defined by ISO 3382. A Clarity score is based off the room impulse response (RIR). On the RIR the direct sound is set to be relative time zero so it is not factored in the equation. The Clarity score takes the RIR and divides it at a certain time point. The energy before the divider (early reflections) is summed and the energy after the divider (late reflections) is summed. It then does an early to late energy ratio expressed in dB. I use SMAART and get a C10, C35, C50, and C80 score for each octave and 1/3rd octave band and an overall score. C50 is used to score a performance space for speech and C80 is used to score a performance space for orchestral music. However, the lower Clarity figures are helpful for small room acoustics.

I think we're getting our words mixed up here. It appeared to me that @Kvalsvoll was describing C50 as a ratio between direct sound and early reflected sound, without regard for late reflected sound:

On 1/18/2018 at 6:42 PM, Kvalsvoll said:

This is not true. Acoustics determines the ratio between early direct sound and later reflected sound. The ratio between early direct sound and early reflected energy defines clarity. This is the case both for reproduction and when speaking in a room, and there are established standards for this.

OK. I admit he described it as a ratio of "early direct sound" to "later reflected sound" in one sentence, and then described it as a ratio of "early direct sound" to "early reflected energy" in the latter sentence. Now I'm not exactly certain what he meant, but I agree with the definition @mojave posted. Though, I'm not exactly certain what he means when he writes: "On the RIR the direct sound is set to be relative time zero so it is not factored in the equation." Is he saying that direct sound energy is not part of the early energy sum? If so, that'd be quite absurd. It's far more likely the case that the "early sound" interval is meant to begin right at the start of the RIR and include the initial arrival.

Anyway, we are hopefully in agreement that higher C50, meaning more early arriving sound than late arriving sound, is preferable for intelligibility. Just to repeat. Early reflections (before 30-40 ms) not only don't harm dialog intelligibility, they actually *improve* dialog intelligibility. This is discussed in Floyd Toole's "Sound Reproduction" 3rd ed., pgs 200-201 and is supported by multiple experiments. Toole introduces the section by pointing out:

Quote

it is popular to claim that reflected sounds within small listening rooms contribute to degraded dialog intelligibility. The concept has an instinctive 'rightness' to it. However, as with several perceptual phenomena, when they are rigorously examined, the results are not quite as expected. This is another such case.

He also supports my point about C50 being unimportant for typical small rooms:

Quote

These intervals [30-40 ms] embrace all consequential early reflections in domestic rooms or control rooms, although in larger rooms longer delays can be problematic.

So do acoustics matter for dialog intelligibility in *small* (domestic) rooms? No, not so much.

9 hours ago, mojave said:

Gerald Marshall provided a "Poor, Fair, Good" chart in his 1996 AES article called An Analysis Procedure for Room Acoustics and Sound Amplification Systems Based on the Early-to-Late Sound Energy Ratio. He used a weighted average octave band score with 15% for 500 Hz, 25% for 1kHz, 35% for 2kHz, and 25% for 4Khz. My preference for speakers are those that avoid a crossover in any of these bands. My JTR 215RT's have a crossover at 350 Hz and 6500 kHz.

The article is pay-walled like most. In any case, the weighting seems reasonable with regard to the relative importance of those frequencies for speech. As for speaker crossover frequency preferences, I tend to think the implementation details matter much more, and that the anechoic linear response as a whole, on and off-axis, is the most important quality. If I had to accept a badly implemented crossover though, well, I guess I'd rather it not be in the mid-range. Though, I'd rather just have a good XO. FWIW, my speakers are currently 2-way, linear-phase 8th-order acoustic at 850 Hz. I can't "hear" adverse effects of the XO at all, even when standing right next to the speaker.

SME · January 27, 2018

I mentioned this in the main "Low Frequency Content" thread. As of now, I suspect that Skywalker Sound Studios is the only shop that's been applying substantial re-EQ to their home mixes. That does cover quite a lot of content though.

I'm going to guess that this practice started somewhere between 2008 and 2010. Comparing Pixar movies, "Wall-E" (2008) benefits from re-EQ, but "Toy Story 3" (2010) sounds right without it. "How to Train Your Dragon" (2010) also sounds like a re-EQed home mix. I'm thinking I may have to pop in "9" (2009), to hear whether it was re-EQed or not. Maybe I can also try "Up" (2009), but I don't have a copy around to try.

Anyway, I'm still working on my latest EQ configurations to address the "hidden resonances" problem. It's a slow process because my new approach is still under development. For now, the process is kind of fuzzy. I'm not exactly sure how best to weigh the data I'm using. Just last night, I started a trial of a new configuration for my left and right channels, and it is unambiguously superior. I must be on the right track. I think the improvements to the center and surrounds will be even more substantial.

SME · February 24, 2018

I plan to resume this work soon. It's taken me a long time to develop new EQ configs for my speakers after becoming aware of the "hidden resonances" I described. The effort has been very well worth it, and my sound quality is substantially improved from before, which was very good already. I finally finished updating the surrounds early today, but I want to take some time to watch a lot of movies and develop full confidence in this new configuration before attempting to do any critical work. From what I've heard so far, I'm not inclined to revise anything I've argued in this thread so far. With these new EQ configs, the broad tonal imbalances found on typical cinema tracks are somewhat less objectionable to me than they were before, particularly with regard to the low frequencies. However, the imbalances are just as apparent if not more so.

On the other hand, fixing the hidden resonances will help immensely with getting my judgments right. The center had some significant issues in the 200-300 Hz, which I was aware of and did my best to work around. This range is crucial because it is where anechoic flat speakers tend to see a lot of gain from baffle step loss and/or reverb time increase and so is a region where alterations as part of the X-curve calibration procedure are likely to be more substantial. It is the range that I most often apply center shelving filters to deal with low frequency excess.

I also had significant "hidden" resonances around another crucial area, between 2-3 kHz or so. In hindsight, this explains why "getting the knee right" seemed to be such an unforgiving exercise. Though, I imagine this region will still continue to be difficult simply because the X-curve "knee" resides at 2 kHz. As I previously noted, the soundtracks I've worked on sound quite bad with a sharp knee at 2 kHz. A more gradual transition from "flatish" to "sloped" seems to be required for the best sound, but getting the shape of the transition right is crucial and is different for every track. The area around 2 kHz has a big impact on speech and the upper-mid "punch" in many sounds.

As an aside, I own a copy of Floyd Toole's most recent (3rd) edition of "Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms", which is a superb book on the subject, which is a superb reference for all things having to do with high quality audio reproduction. I can't recommend the book enough! Anyway, Toole dedicates an entire chapter to cinema sound and the X-curve with a lot of great info. He has a lot to say about the knee at 2 kHz as well:

Quote

The knee at 2 kHz is perceived as a low-Q spectral feature. I have heard such coloration in theaters, including high profile "reference" rooms. It is made worse when calibrators take pride in replicating the knee, not smoothing it. In some places, one pays extra for this attention to detail. Suitable equalization during mixing could alleviate these issues, but this does not always happen. It is not, as has been commonly thought, a matter of boosting the high frequencies. That alone restores the original problem: excess brightness. Significant sound spectrum rebalancing is needed, including boosting the bass and smoothing or eliminating the 2 kHz knee.

Anyway, there are many other details in that chapter that are worth discussing here at some point in the future. I'll just say that my arguments here are reasonably well supported by Toole's opinions as well. The X-curve standard degrades sound quality, both during the production process and during the reproduction in typical cinemas.

SME · February 24, 2018

I want to make some other notes about recently watched movies. After finishing re-EQ of my front stage, I watched "Finding Nemo 3D" on BD (2010), IIRC released in 2012. Being that the original was mixed in Skywalker Sound Studios, I was curious if the re-release might have a quality re-EQed home mix. It would seem not. It's possible that they didn't bother with a remix at all, being that the major change was in the video (3D conversion). Or if they did remix it, they did so in a Disney studio instead of at Skywalker Sound. A -2 dB/octave HF slope and -2 dB bass shelf cleaned it up very nicely, as is typical for other cinema mixes out of Skywalker Sound like Wall-E. Apart from being tainted by cinema EQ, the mix is very high quality as with most Pixar stuff. It deserves mention that the dialog is noticeably hot on this mix. It makes me wonder if Disney did a "lazy" home mix and just punched the dialog up 3 dB or whatever.

Tonight I watched "Zootopia". This is a very recent Disney release (2016) with audio post done in Skywalker Sound Studios, so I expected a re-EQed mix. However, the mix was obviously shelved in the 200-300 Hz range (the usual spot for that sort of thing). I could not judge the highs because there was obvious low-end masking, but perhaps this movie did not get a home mix, was not done in Skywalker Sound's dedicated room, or did not get re-EQed for some other reason. Who knows?

As a curiosity though, the "Zootopia" BD contained several "Deleted Scenes", and the spectral balance of the audio in those scenes was totally different. It sounded much more natural and balanced, if even a bit thin (in the low frequencies). The more balanced rendition revealed a lot more mid-range detail and nuance in the actors and actresses voices. This was even more obvious at a few points in which the directors replayed snippets from the actual movie in order discuss the (deleted) scene that was about to be presented. As such, the difference in spectral balance probably did not arise because of work a mixer did on the audio for the special feature itself. My guess is that the audio for the deleted scenes was presented as it was recorded, albeit with whatever EQ was applied to the clean-up the signal from the mics. Clearly these scenes were cut during an earlier phase of development as some consisted only of hand-drawn stills while others were CGI rendered at lower quality. The implication here is that the dialog audio presented in those scenes was from an earlier phase of production, perhaps before it had ever seen an X-curve calibrated dub-stage, at which point, bass boost was applied.

SME · February 25, 2018

I watched "9" tonight. My verdict? It sounds *excellent without EQ correction*. The dialog on the film sounds very natural and well-balanced with excellent mid-range clarity and just the right amount of fullness. It also goes without saying that the bass effects on this soundtrack are tremendous. The large hits are very wide bandwidth, combining both slam and weight.

The "9" soundtrack is a reference for home theater audio systems in every respect.

maxmercy · February 27, 2018

Agreed. 9 is terrific, it merits is 5-star rating

JSS

SME · April 16, 2018

It's going to be a while before I do any more work here as I am *still* working on perfecting my system. My new calibration approach is much superior to the old, but I don't have a well-defined systematic approach yet. Different modifications lead to different different sounds, and I've listened to enough variations and developed enough intuitive awareness to know that I'm still not quite there. I'm getting very close though. It's getting better with each iteration. The detail I'm hearing in music is just crazy, to the point that many albums sound almost unfamiliar despite having listened to them on a variety of systems for years.

I'm also leaning more toward waiting until I can make on-the-fly EQ adjustments before trying to do this. It's just very hard to evaluate changes with 30 second gaps in between and very time consuming to iterate repeatedly.

I do want to make an interesting note. A couple weeks ago, I watched the BD re-release of "Monsters Inc". We started with 3D, which was fun until the PS3 choked (sadly, it has issues reading some discs), and because the PS3 doesn't deliver a TrueHD soundtrack and 3D together, I listened to the included 5.1 track. After switching to the other BD player (no 3D) to play the rest of the movie, I opted to continue the film with the 7.1 mix instead.

The 7.1 mix was very different, and not just in terms of surround use. It was louder, and had quite a bit more mid-bass and treble (too much in both instances, IMO); whereas, the 5.1 had a bit of upper mid push. I'm fairly certain the 5.1 was a home mix with re-EQ. I'm not sure if the 7.1 was re-EQed from cinema or not. In any case, both my wife and I thought the 5.1 mix sounded better. The one benefit of the 7.1 mix was a bit more top end extension, but treble was definitely too hot. The 5.1 sounded more dynamic, had much better slam (by virtue of the mid-bass being better balanced with the low mids), and rendered ambiance much more faithfully. (FWIW, my experience is that the quality of ambiance rendering is a *major* clue as to overall tonal balance and sound quality!) Considering that ambiance is a *big* part of the appeal of surround, the 7.1 mix was quite a let down, IMO. By pulling back the upper mid just a hair, the 5.1 likely would have be nearly perfect.

So if anyone wonders whether EQ is applied during home remixing, the answer is yes, at least sometimes. If anyone gets bored and wants to compare the 5.1 and 7.1 on "Monsters Inc", I'd be curious about others' impressions, but it's not a big deal.

X-curve compensation re-EQ

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

SME

Ricci

maxmercy

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation