Jump to content
lowerFE

The ultimate small speaker - final design peer review thread

Recommended Posts

I'm long overdue for an update. I've actually lately been too busy simply enjoying these speakers or giving people demos of these! But there were actually a number of big improvements that made night and day differences to the sound quality of these. I will talk about them over a number of posts. This post will be about my discoveries in sound signature preferences.

The Universal Sound Signature Preference

I'll start with a story. I brought this to an audio enthusiasts get together a few weeks ago as a bunch of people want to hear this speaker I've built. I set the speakers up in my friend's room, but I was having strange setup problems that I've never encountered before that took some time to fix. Since there was about a dozen people waiting and eager to hear this, for the sake of time I only did a rough setup that resulted in poor placements (speakers placed right against the side wall and above ear level) and did only a rough room correction to compensate. However, once it was set up, nobody wanted to leave. This is a room where at least half the people owned 5 figure sound systems at home, many had traditional speakers, some had tubes, some had huge horn speakers, and they all sat there, continuously adding songs to the queue, and listened to the speakers for over 4 hours besides a little break here and there to talk and discuss. The fact that they listened for over 4 hours tells me that everyone truly loved the speaker. Otherwise they would have simply left after a few songs and went to talk to other people outside.

Unlike the traditional thought that people have different preferences in how a speaker sounds, where some people like their speakers sounding bright, dark, warm, etc. I believe there is a universal preference (with one exception), and now I have strong anecdotal evidence that supports this. This is going to be difficult to believe, but once you hear this, I think most of you will agree. I believe the universally preferred sound signature is one that is subjectively flat. I've tried this with over 50 people at this point, and it is clear that this is an appealing sound signature regardless what their original preference is. I've had people actually tell me this is very different from what he's used to hearing, and it changed his view on what is "good sounding".

What is a "subjectively flat" sound?

Now this is tricky. This is not flat, like a speaker with a flat frequency response, but subjectively flat. Our ears hear differently at different volumes. At normal volumes, our ears are less sensitive to bass, and to treble, but to a lesser extent. As the volume goes up, our ear becomes increasingly sensitive to bass and treble. This means that a speaker that measures flat will sound thin at normal volumes since the subjective frequency response that our ear hears will be a "semi-circle" shape where the bass and treble is rolled off due to the lower sensitivity of bass and treble. This is why many speakers sound better at louder volumes. This is because as the volume goes up, our ear's sensitivity to bass and treble gradually increases. This means that the speaker is sounding more and more subjectively flat as the volume goes up. We like a flat sound, which is why we like the speaker played louder because it is closer to flat. The same reasoning can be applied to why bright speakers sound nice at normal volumes, but becomes annoying at high volumes. At normal volumes, the bright sound compensates for our lack of sensitivity to treble, so the top end sounds subjectively flatter than a neutral speaker. But when you turn the volume up, the ear's sensitivity to treble increases, and now the ear hears a sound with too much treble, and we don't like it because we like a flat sound.

What does this mean? What we perceive as sounding "flat" varies dramatically depending on the volume. In order to achieve the universal preference of subjectively flat, we need a speaker that changes its frequency response depending on the volume it is played at. This is not possible to achieve this with any passive speaker.

So for a speaker to sound subjectively flat, there must be a bass boost and treble boost. It is not a straight boost either, but a continuous slow rising response starting from low midrange (around 400Hz) and into the very deepest of bass. A similar, but a much smaller rising response is needed for treble starting around 5000Hz. Isn't that just a V shaped frequency response? You must be shaking your head in disgust! V shaped??? Blasphemy!!! However, the response I describe here is almost impossible to do on a passive speaker since the boost required on the bass would require the speaker to lose over 10dB of sensitivity, and the inductor needed to start cutting at 30Hz is just impractical. This is why most "V" shaped speakers don't sound great. They are not getting the right target curve needed to sound correct. Even if they are through the use of an external equalizer, the amount of bass boost and treble boost needs to be different depending on the volume so it always results in a flat subjective response to our ears. The equalizer provides a static change in frequency response and it doesn't changing with volume, so it'll sound bad at higher volumes and create fatigue (too much treble) and boominess (too much bass). With my speakers, after a calibration it knows exactly what the SPL is at the listening position, so it can automatically adjust the bass and treble depending on the listening volume. This is the key to get the speaker to sound subjectively flat to our ears, and if done right, it sounds downright amazing, and just sounds right.


The Exception

(*) What is the exception? I've found that this is not true for people with substantial hearing loss, i.e. a lot of old people. This is the group that heavily favours a very rolled off treble sound. For some reason I don't yet know, these people seem to think treble is the devil. I would think with hearing loss, you would want MORE treble to compensate for their reduced high frequency hearing. However, it seems like people with hearing loss genuinely hate treble because for some reason it greatly irritates them. I brought these to the Burning Amp, where most of the attendees are well over 50. I ran a long 20 second frequency response sweep, which meant there were 10 seconds or so where the sweep is in the treble region. I noticed several people covering their ears during the sweep, and some looked like they're in pain. I got much less positive reception there, which is understandable because most of the speakers that were presented had, in my opinion, essentially no treble. And of course, these "treble-less" speakers got huge positive receptions, which is not surprising at all if hearing high frequency causes these people to contort their facial expressions.

Share this post


Link to post
Share on other sites
Quote

What is the exception? I've found that this is not true for people with substantial hearing loss, i.e. a lot of old people. This is the group that heavily favours a very rolled off treble sound. For some reason I don't yet know, these people seem to think treble is the devil. I would think with hearing loss, you would want MORE treble to compensate for their reduced high frequency hearing. However, it seems like people with hearing loss genuinely hate treble because for some reason it greatly irritates them. I brought these to the Burning Amp, where most of the attendees are well over 50. I ran a long 20 second frequency response sweep, which meant there were 10 seconds or so where the sweep is in the treble region. I noticed several people covering their ears during the sweep, and some looked like they're in pain. I got much less positive reception there, which is understandable because most of the speakers that were presented had, in my opinion, essentially no treble. And of course, these "treble-less" speakers got huge positive receptions, which is not surprising at all if hearing high frequency causes these people to contort their facial expressions.

 

Some how I don't quite agree with that summation.

True some people like little or no treble.

But that is not really attributable to age.  I have clients and colleagues that are long in the tooth but really appreciate an extended top end response.

I do agree that your idea of a generally accepted loudspeaker response is best described as subjectively flat.  Most systems that I have listened to that were dead flat were very bright on the top end.  My personal work aims for a combined listening position that is down a few db beyond 10 khertz.  Down about 1.5 to 2 db in the last octave.

Share this post


Link to post
Share on other sites

You say that a V-shaped response sounds better at "normal volume", but you don't specify what constitutes "normal".  That's because listening levels can vary substantially by situation and preference.  However, audio engineers aim to shape the tonal balance of the soundtrack so that it sounds subjectively flat when it is played (1) on speakers that measure flat-ish in an anechoic chamber (2) placed away from walls in a moderately large residential listening room (3) at a lively but not excessive listening level.  Note that in contrast to anechoic measurements, the in-room frequency response in this scenario will tend to have quite a bit of tilt down from the bottom to the top (i.e. 5-10 dB or more higher at 20 Hz vs. 20 kHz).  The in-room response shape is probably much less important than the anechoic response of the speaker as configured, and the "ideal" shape will most likely be speaker and room dependent.

Per Katz and others, such a listening level typically reproduces forte passages (i.e. the chorus in a typical pop or rock recording) at around 83 dBC RMS average.  Obviously, the precise figure can vary a bit on the type of content with bass heavier content typically calling for higher dBC levels due to the relative lack of loudness contributed by bass.  Modern practice studio practice dictates calibrating the monitors to a constant SPL using a standard pink noise source and then adjusting soundtrack levels by ear.  For typical "loudness war" music masters, the appropriate master volume level will be around -14 to -17 dB, relative to cinema reference level.  For older releases and anything mastered with more peak headroom, -8 to -12 dB is reasonable.  Some content is purposely mastered with lots headroom that justifies even higher master volume levels.

All this is fine and good if one is listening at such calibrated levels all the time, but that's rarely true in practice.  Casual home listening levels (what you describe here as "normal volume") are likely to be a bit less than what was used in the studio.  Listening levels at clubs and live events are often higher (e.g. 90 dBC RMS average) than was used in the studio.  Adjustments to shape are warranted in either case.  For low levels, boost to bass and treble is appropriate, although the ideal treble boost is usually pretty modest compared to the ideal bass boost.  For higher listening levels, it's probably more important to reduce treble, which can otherwise be very uncomfortable, than to reduce bass.  A potentially better solution for live performance is to do a different mix with more peak headroom and with some of the drums and bass instruments mixed hotter, which can add a lot of impact to the performance without making it excessively loud.

===================

What you describe, as far as applying a V-shape boost to the response that varies with master volume is a lot like how Audyssey Dynamic EQ works, albeit without the "Dynamic" part.  Varying the amount of V-shaped boost vs. master volume is definitely a lot better than applying the same boost regardless of master volume, but it's possible to do even better.  The amount of boost should also vary with the content itself.  See the Equal Loudness Contours.  The ideal Dynamic EQ should compensate for subjective differences between the sound, at the SPL it was reproduced at the reference level (i.e., in the studio) vs. the SPL it is reproduced during playback.  If you look at the curves, you can see that the relative spacing of contours changes with absolute SPL, so Audyssey DEQ continuously and "dynamically" analyzes the content and adjusts the amount of boost applied in real-time.  IMO, it's the one technology of theirs that actually works pretty well, as long as you don't use it with surround speakers.

 

Share this post


Link to post
Share on other sites

http://seanolive.blogspot.ca/2014/01/the-perception-and-measurement-of.html

 

http://read.uberflip.com/i/324330-lis-2014/22

 

http://seanolive.blogspot.ca/2008/12/loudspeaker-preferences-of-trained.html

 

A lot larger pool of people than I have ever had the chance to work with agree that listener preferences are universally agreed upon regardless of age.

Good sound is simply put, the right sound.  Right in the sense that it accurately reproduces the sound of a live acoustic event.  Caveat being that when you set up a loudspeaker system in this way many recordings are exposed for a series of poor choices.  And poorer outcome in reproduced sound.

 

One last note.  There should be no difference between accurate reproduction on a loudspeaker and accurate reproduction on a headphone.  That is the anchor of this series of research articles.  After all you are using the same set of ears for both types of listening aren't you?

Edited by mwmkravchenko
Missed a link. dragged my knucles for too long...

Share this post


Link to post
Share on other sites
On 2017-11-19 at 4:53 PM, mwmkravchenko said:

Some how I don't quite agree with that summation.

True some people like little or no treble.

But that is not really attributable to age.  I have clients and colleagues that are long in the tooth but really appreciate an extended top end response.

I do agree that your idea of a generally accepted loudspeaker response is best described as subjectively flat.  Most systems that I have listened to that were dead flat were very bright on the top end.  My personal work aims for a combined listening position that is down a few db beyond 10 khertz.  Down about 1.5 to 2 db in the last octave.

I am not saying old people don't like extended treble response. I'm saying people with hearing damage don't like extended treble, and that is generally old people. That's like saying good speakers sound good, and they're generally expensive. Doesn't mean expensive speakers are good, just happens that expensive speakers generally sound good. 

There are lots of older people that love the sound of my speakers. Had a retired man who cranked metal music on my speakers, and cranked it LOUD, and he told me it had the best cymbal crashes he's ever heard!

8 hours ago, mwmkravchenko said:

http://seanolive.blogspot.ca/2014/01/the-perception-and-measurement-of.html

 

http://read.uberflip.com/i/324330-lis-2014/22

 

http://seanolive.blogspot.ca/2008/12/loudspeaker-preferences-of-trained.html

 

A lot larger pool of people than I have ever had the chance to work with agree that listener preferences are universally agreed upon regardless of age.

Good sound is simply put, the right sound.  Right in the sense that it accurately reproduces the sound of a live acoustic event.  Caveat being that when you set up a loudspeaker system in this way many recordings are exposed for a series of poor choices.  And poorer outcome in reproduced sound.

 

One last note.  There should be no difference between accurate reproduction on a loudspeaker and accurate reproduction on a headphone.  That is the anchor of this series of research articles.  After all you are using the same set of ears for both types of listening aren't you?

There is a WORLD of difference between accurate reproduction on a headphone vs on a loudspeaker. There is zero crosstalk on headphones, which dramatically changes how imaging is perceived. There is no concept of direct vs reflected sound on headphones since it is all direct sound. There is no concept of directivity on headphones either. The amount of bass is dependent on the seal and fit of the headphone. Correlations between headphones do not necessarily apply to speakers. 

On 2017-11-20 at 2:53 PM, SME said:

You say that a V-shaped response sounds better at "normal volume", but you don't specify what constitutes "normal".  That's because listening levels can vary substantially by situation and preference.  However, audio engineers aim to shape the tonal balance of the soundtrack so that it sounds subjectively flat when it is played (1) on speakers that measure flat-ish in an anechoic chamber (2) placed away from walls in a moderately large residential listening room (3) at a lively but not excessive listening level.  Note that in contrast to anechoic measurements, the in-room frequency response in this scenario will tend to have quite a bit of tilt down from the bottom to the top (i.e. 5-10 dB or more higher at 20 Hz vs. 20 kHz).  The in-room response shape is probably much less important than the anechoic response of the speaker as configured, and the "ideal" shape will most likely be speaker and room dependent.

Per Katz and others, such a listening level typically reproduces forte passages (i.e. the chorus in a typical pop or rock recording) at around 83 dBC RMS average.  Obviously, the precise figure can vary a bit on the type of content with bass heavier content typically calling for higher dBC levels due to the relative lack of loudness contributed by bass.  Modern practice studio practice dictates calibrating the monitors to a constant SPL using a standard pink noise source and then adjusting soundtrack levels by ear.  For typical "loudness war" music masters, the appropriate master volume level will be around -14 to -17 dB, relative to cinema reference level.  For older releases and anything mastered with more peak headroom, -8 to -12 dB is reasonable.  Some content is purposely mastered with lots headroom that justifies even higher master volume levels.

All this is fine and good if one is listening at such calibrated levels all the time, but that's rarely true in practice.  Casual home listening levels (what you describe here as "normal volume") are likely to be a bit less than what was used in the studio.  Listening levels at clubs and live events are often higher (e.g. 90 dBC RMS average) than was used in the studio.  Adjustments to shape are warranted in either case.  For low levels, boost to bass and treble is appropriate, although the ideal treble boost is usually pretty modest compared to the ideal bass boost.  For higher listening levels, it's probably more important to reduce treble, which can otherwise be very uncomfortable, than to reduce bass.  A potentially better solution for live performance is to do a different mix with more peak headroom and with some of the drums and bass instruments mixed hotter, which can add a lot of impact to the performance without making it excessively loud.

===================

What you describe, as far as applying a V-shape boost to the response that varies with master volume is a lot like how Audyssey Dynamic EQ works, albeit without the "Dynamic" part.  Varying the amount of V-shaped boost vs. master volume is definitely a lot better than applying the same boost regardless of master volume, but it's possible to do even better.  The amount of boost should also vary with the content itself.  See the Equal Loudness Contours.  The ideal Dynamic EQ should compensate for subjective differences between the sound, at the SPL it was reproduced at the reference level (i.e., in the studio) vs. the SPL it is reproduced during playback.  If you look at the curves, you can see that the relative spacing of contours changes with absolute SPL, so Audyssey DEQ continuously and "dynamically" analyzes the content and adjusts the amount of boost applied in real-time.  IMO, it's the one technology of theirs that actually works pretty well, as long as you don't use it with surround speakers.

 

I was lazy and posted the same thing that I posted on other forums to this forum as well. The above is "dumbed down" so it is easier to digest for the average DIY'er. For this forum I should write something a lot more technical. 

But basically yes, it is just Audyssey Dynamic EQ, but a better version of it. Audyssey nailed the lower range, but not so much on the upper range. I still haven't gotten it as right as I could make it, but so far it has been a lot better with this than without it. 

You're right that the amount of boost that needs to be applied will be dependent on the source. My intention is not to get it right for everything because that's not possible. If you optimize for one song, it might introduce problems for other songs. Therefore, the goal is to apply a general broad stroke correction so it provides a positive benefit for ALL sources. 

Share this post


Link to post
Share on other sites
6 hours ago, lowerFE said:

There is a WORLD of difference between accurate reproduction on a headphone vs on a loudspeaker. There is zero crosstalk on headphones, which dramatically changes how imaging is perceived. There is no concept of direct vs reflected sound on headphones since it is all direct sound. There is no concept of directivity on headphones either. The amount of bass is dependent on the seal and fit of the headphone. Correlations between headphones do not necessarily apply to speakers. 

The writers of those articles would have a very healthy argument that you are incorrect.

 

And so would I I often use a high quality pair of headphones as a reference when working on loudspeakers.  The entire basis of the articles I cited are the relationships between the sound of an accurate pair of speakers and an accurate pair of headphones.  Don't allow what you have in a headphone sway your decision.  There are some headphones that have been designed with this type of carefully crafted contour.  The data for this type of EQ matching loudspeakers is widely available in  many reports and there is even standards that are in the works to define an accurate headphone response.

 

They all use a well setup loudspeaker as a reference. 

Share this post


Link to post
Share on other sites
12 hours ago, mwmkravchenko said:

http://seanolive.blogspot.ca/2014/01/the-perception-and-measurement-of.html

 

http://read.uberflip.com/i/324330-lis-2014/22

 

http://seanolive.blogspot.ca/2008/12/loudspeaker-preferences-of-trained.html

 

A lot larger pool of people than I have ever had the chance to work with agree that listener preferences are universally agreed upon regardless of age.

Good sound is simply put, the right sound.  Right in the sense that it accurately reproduces the sound of a live acoustic event.  Caveat being that when you set up a loudspeaker system in this way many recordings are exposed for a series of poor choices.  And poorer outcome in reproduced sound.

 

One last note.  There should be no difference between accurate reproduction on a loudspeaker and accurate reproduction on a headphone.  That is the anchor of this series of research articles.  After all you are using the same set of ears for both types of listening aren't you?

I have a lot of respect for Toole et.al. and Harman's work in these areas.  I think their speaker research correlating anechoic measurements, on and off axis, to listener preferences was truly groundbreaking.  The underlying hypothesis that motivated the aforementioned studies is that listeners effectively hear the anechoic sound of the speaker despite the room and that the room imparts qualities to the sound independent of our perception of the source.  While the research they performed did not prove this hypothesis conclusively, it offers strong evidence, which has motivated the design of some of the best sounding speakers in the world.

With that said, there are definite limitations and caveats to this hypothesis.  Both theory and evidence suggest that it does not apply for low frequencies, below some cut-off where room effects can no longer be fully distinguished from the anechoic sound of the speaker.  The researchers do acknowledge this limitation, and Harman generally recommends using room EQ of some kind below 500 Hz or so.  Unfortunately, they don't offer any specific guidance with regard to what such room EQ should do, even though they offer a product to do so, one that appears to work by fitting magnitude-smoothed in-room response to a target.  The fact of the matter is that in-room response from an anechoic flat speaker can vary substantially depending on speaker dispersion characteristics, listening distance, placement of speaker and seats relative to nearby boundaries, and other room acoustic effects.  Even though they may have derived their in-room target curve by measuring an actual anechoic flat speaker in their test room, I'm convinced that a one-size-fits-all target will not work reliably under a wide variety of conditions.  This should be obvious because the approach contradicts the original hypothesis that listeners hear the anechoic sound.

There is a larger issue with magnitude-smoothed measurements in general.  Almost everyone who does room measurements uses them to try to understand what's going on with sound in a room, but practically no one knows what they actually mean.  Mathematically speaking, the magnitude-smoothing process has side-effects that alter the information captured in the full impulse response in unexpected ways.  This is a point that's a bit too technical to elaborate in detail here, and I have a kind of TODO for myself to properly write this up and publish it.  One hint is that if you compare, say, a 1/3rd octave magnitude-smoothed impulse response measurement with a 1/3rd octave band power response measurement of pink noise, they will likely have different shapes.  The pink noise power response shape could be reproduced from the impulse response data, but only if the smoothing applied to the magnitude-squared response rather than merely magnitude response.

To pick on some specific issues with the study featured in the first link, the target curve they are promoting appears to be based entirely on starting with a flat response and allowing listeners to make adjustments to generic treble and bass controls.  The corner frequencies for these controls is fixed, and so users have very little flexibility in the adjustments they make, and the result is likely to be a long way off from what an ideal curve would be.  For example, if just one aspect of the sound in the "treble" is offensive, for example the 3 kHz ear resonance, which ears may be more sensitive to for sound arriving directly from the sides, the listener is likely to adjust the "treble" down until that resonance is no longer offensive, which may involve cutting a lot more sound above that frequency that is not offensive.  The same could be true for bass.  If the bass control knee were a bit higher so the adjustment added a bit more upper bass, the user might push the bass a bit higher than otherwise.  Frankly, I think this approach is largely a waste of time.  A much better albeit still imperfect approach to take would be to have each Harman employee who is a trained-listener create their own arbitrary EQ target by ear, and then have a blind "shoot out" of different targets among all the listeners to choose which one sounds best.  The process could then be repeated, iteratively, using the latest "preferred curve" as a basis that gets refined further.

With that said, I have little doubt that Harman's approach to headphones and room EQ achieves better sound than other available options.  There are many wrong ways to calibrate speakers and headphones.  Some are a lot more wrong than others.  From a business perspective, Harman only really needs to beat their competitors to be able to offer a distinguished product.  From there, they can merely make incremental improvements to achieve a long-term forced obsolescence schedule, but those of us who want the best sound would probably rather not wait a few decades for a company like Harman to achieve the best that is possible.

With that said, I have little doubt that Harman's approach to headphones and room EQ achieves better sound than other available options.  There are many wrong ways to calibrate speakers and headphones.  Some are a lot more wrong than others.  From a business perspective, Harman only really needs to beat their competitors to be able to offer a distinguished product.  From there, they can merely make incremental improvements to achieve a long-term forced obsolescence schedule, but those of us who want the best sound would probably rather not wait a few decades for a company like Harman to achieve the best that is possible.

Share this post


Link to post
Share on other sites
14 hours ago, lowerFE said:

You're right that the amount of boost that needs to be applied will be dependent on the source. My intention is not to get it right for everything because that's not possible. If you optimize for one song, it might introduce problems for other songs. Therefore, the goal is to apply a general broad stroke correction so it provides a positive benefit for ALL sources. 

Compensation for differences between masters can be done with Audyssey Dynamic EQ using the Reference Offset feature.  However, I'm speaking of a different issue here.  Even within a single song, the correction needs to be different for higher SPL vs. lower SPL parts.  Audyssey Dynamic EQ actually analyzes the content and adjusts the amount of compensation in real-time.  This kind of thing may be a lot more important for symphonic music and movie soundtracks, which contain large macro-dynamic swings, than for pop music that tends to be consistent in level.  Obviously, this is not easy to accomplish without custom DSP capability, and even then it's not obvious what approach is best to take.

By the way, I can't help but wonder if the listeners at Burning Amp might have been spoiled by your sine sweep.  Sine sweeps can sound a bit harsh to my ears, even at pretty modest levels like 70 dB, particularly when they hit the 3 kHz ear resonance.  Those who are unfamiliar with sine sweep measurements may have gotten a bad first impression which colored their later judgments.

Share this post


Link to post
Share on other sites

I'm guessing that this Audyssey EQ is a sliding dynamic loudness contour.  That does have application when you are listening  at levels below 70 to 75db averaged.

Simple comment on room measurements.  They are very dependent on the stimulus made to take the measurement and the type of math applied to perform the measurement.  The errors multiply rather rapidly depending on the choices made.

I know of no professional who works in acoustics that looks at averaged measurements and gives them any worth.  Raw measurements are where the real information lies.

Lastly.  I may have not posted the links to all the Sean Olive Papers and may be remembering What I was reading in the JAES papers freely available and also on the JAES website.  But I know for a fact that the reference speakers were in a standardized listening room and that the panel of trained listeners preferred the sound characteristics associated with a standardized semi-reverberant room.  Not an anechoic reproduction also called diffuse field EQ that was the norm for a few years back. 

 

Share this post


Link to post
Share on other sites
1 hour ago, mwmkravchenko said:

I'm guessing that this Audyssey EQ is a sliding dynamic loudness contour.  That does have application when you are listening  at levels below 70 to 75db averaged.

Yes.  I think that's one way you could describe it.

1 hour ago, mwmkravchenko said:

Simple comment on room measurements.  They are very dependent on the stimulus made to take the measurement and the type of math applied to perform the measurement.  The errors multiply rather rapidly depending on the choices made.

If we're talking about impulse response measurement specifically, then the end result should be fairly similar using different methods if they are done properly.  Where differences are likely to arise is at the tail end of the impulse response where signal-to-noise ratio becomes much more important.  This is probably much more important for studying acoustics than for speaker calibration.

1 hour ago, mwmkravchenko said:

I know of no professional who works in acoustics that looks at averaged measurements and gives them any worth.  Raw measurements are where the real information lies.

Lastly.  I may have not posted the links to all the Sean Olive Papers and may be remembering What I was reading in the JAES papers freely available and also on the JAES website.  But I know for a fact that the reference speakers were in a standardized listening room and that the panel of trained listeners preferred the sound characteristics associated with a standardized semi-reverberant room.  Not an anechoic reproduction also called diffuse field EQ that was the norm for a few years back.

Are we talking about averaging or smoothing?  And which particular methods?  The raw impulse response measurement contains the "real information" yes, but for in-room measurements, the information must be extracted to be useful.  I can't even really post a usable picture of my raw in-room frequency response because there aren't enough pixels to illustrate all the narrow peaks and dips that arise from late arriving energy.  Practically every visualization of frequency response uses some kind of smoothing, and people almost always rely on magnitude-smoothing.  Most people just enable the "smoothing" option in REW or whatever program they use and assume it's improving the visual appearance of the data.  They don't understand that it's actually altering the data in a way that's not consistent with expectations.  Even at 1/48th octave resolution, magnitude smoothing can omit the contribution of a lot of late arriving energy which can be seen in a continuous RTA measurement of pink noise.

Smoothing methods that are likely to be more consistent with expectations include power response smoothing (magnitude squared) and complex smoothing (smoothing of magnitude and phase together, as they relate to the complex number plane).  The former is completely time blind.  The latter is completely time local.  Magnitude smoothing is some strange hybrid between the two which makes little sense for most purposes.  Yet, almost all room EQ systems I'm aware of including Harman's rely on the latter method and fitting to a target curve.  The cinema X-curve standard is the clear exception, as it relies on old-fashioned pink noise RTA measurements which are equivalent to power smoothing.  It's also seriously flawed because our hearing is, in fact, very sensitive to time-of-arrival.

Anyway, I'm not arguing that listening with anechoic acoustics is preferred at all.  I'm saying that an anechoic flat speaker is preferred, regardless of acoustics (and with limitations and caveats that apply mostly at low frequencies).  This is what Harman has argued with evidence from their listening preference tests.  It's just that they then leap to the conclusion that the in-room magnitude-smoothed response they obtained by measuring such a speaker in their test room is the preferred target curve for *any* speaker in *any* residential-size listening room.  If that were true, it would contradict their original hypothesis.  The real issue is that they don't know of another / better way to analyze in-room measurements for room EQ purposes.

FWIW, I've heard a system calibrated to the Synthesis curve, and while the bass certainly benefited by the suppression of ugly room resonances, pretty much everything above 200 Hz sounded better without the room EQ.  The corrected version was much too rolled off in the upper mid and high frequencies, which suppressed a substantial amount of detail.  I'm also quite convinced that their curve would give inferior results compared to what I use on my system.  My speakers have rather different directivity vs. frequency than theirs, and they are placed against the front wall, so this should be expected.

Share this post


Link to post
Share on other sites
2 hours ago, SME said:

FWIW, I've heard a system calibrated to the Synthesis curve, and while the bass certainly benefited by the suppression of ugly room resonances, pretty much everything above 200 Hz sounded better without the room EQ.  The corrected version was much too rolled off in the upper mid and high frequencies, which suppressed a substantial amount of detail.  I'm also quite convinced that their curve would give inferior results compared to what I use on my system.  My speakers have rather different directivity vs. frequency than theirs, and they are placed against the front wall, so this should be expected.

Agreed.

I'm very much not a fan of one curve fits all and auto generated room EQ.  All you have to do is move about the room a little bit and your wonderful EQ is out the window..

 

We are basically agreeing on the room timing measurements.  My unstated reasoning is that an accurate measurement cannot be made without a stimulus that actually excites all the room modes.  In most cases this precludes the MLS types of signals.  You need chirps and or discrete gated sine wave stimulus to really excite what can happen below around 200 hertz.

DFT gating is also very important to be chosen correctly as you have pointed out. And the correct type of math Math on the DFT.  Choose wisely or suffer the consequences

 

 

 

Share this post


Link to post
Share on other sites

I respect Harman's research as well. Very thorough and well executed. 

However, there is a weakness. The research, and all of its data is based on existing speakers back in the day. Essentially, it is just simply finding out the most preferred speaker that has already been made. It makes no attempt in finding the theoretical optimal sound signature. Therefore, if there is a sound signature that is preferred over anything else existing on the market, but does not currently exist in a speaker due to technological limitations, then it would not be found. A subjectively flat response is exactly curves (since it changes) that a passive speaker cannot do. I argue that having equal loudness compensation is half of the equation in the ideal most preferred curve. The other half is full range constant directivity down to the schroeder frequency. 

Regarding whether the curve should be statically applied based on a volume level versus the instantaneous RMS level of the signal, I think both have merits. Currently I am on the former, but I'm thinking of moving to the latter. The reason is because I think a static equal loudness compensation based on a volume level is most optimal when the reference volume is known, such as in surround sound formats. This is the closest to real life since the frequency response of individual sound sources do not change depending on the volume. 

However, when we don't know the reference volume, I'm inclined to use a dynamically applied compensation. This is because for most music, we simply don't know what the reference volume is. The average level of songs can vary quite widely depending on content. Also, for most music, especially for any music that contains electronic instruments, there is no "reference volume" anyways. We have no idea what "real" really sounds like. Therefore, there is little purpose in trying to achieve "realism", and the goal should be "good sounding". When the reference volume can differ rather dramatically between song to song and genre to genre, a dynamically applied compensation will have a better chance of sounding correct since it is based on the actual SPL heard at the listening position. The obvious exceptions are classical music, or any music with large dynamic swings since they should be left as it is instead of being modified, but I hardly listen to music with real instruments.

 

Share this post


Link to post
Share on other sites
On 11/22/2017 at 6:00 PM, mwmkravchenko said:

Agreed.I'm very much not a fan of one curve fits all and auto generated room EQ.  All you have to do is move about the room a little bit and your wonderful EQ is out the window..

Many commercial room EQ systems require or recommend measurements at multiple locations, so in theory they should work throughout the room.  In fact, for the methods I'm applying / using, multiple measurement locations are necessary for the best sound even at a single sweet spot.  Theoretically, I should be able to optimize for a single location using, e.g. measurements at each ear location, but the required analysis is actually a lot harder that way than if I just measure a wider spread of locations.

But anyway, my criticism isn't so much about use of automated EQ systems but the assumption that fitting magnitude-smoothed response to some target curve will yield consistent results between speakers and/or rooms. More fundamentally, magnitude-smoothed frequency response, like lots of people post all over forums, doesn't give much insight into how a system actually sounds, even if the smoothing is of high resolution, e.g. 1/48th octave.

On 11/22/2017 at 6:00 PM, mwmkravchenko said:

We are basically agreeing on the room timing measurements.  My unstated reasoning is that an accurate measurement cannot be made without a stimulus that actually excites all the room modes.  In most cases this precludes the MLS types of signals.  You need chirps and or discrete gated sine wave stimulus to really excite what can happen below around 200 hertz.

DFT gating is also very important to be chosen correctly as you have pointed out. And the correct type of math Math on the DFT.  Choose wisely or suffer the consequences

I believe MLS signals effectively cover the entire spectrum.  The main drawback of MLS signals is that they give equal weight to each linear frequency, which leads to diminishing signal-to-noise ratio for low frequencies.  The same problem would occur if measurements were done with a linear sine sweep test signal instead of the log sweep that's normally used in, e.g. REW.

In any case, I generally agree with you as far as what it takes to capture an accurate impulse response of the speaker + room response.  However, I'm trying to emphasize that having an accurate impulse response is only the first step.

The real challenge is to extract psycho-acoustically relevant details from the impulse response.  If you view the raw frequency response from a impulse response measurement, it looks like someone just scribbled all over the place, even if the system sounds magnificent.  What most people naively do is applying smoothing to the data, which replaces the scribbles with a wavy that intuitively appears to be more consistent with subjective response but actually isn't.  The exception to this may be for bass, below the true Schroeder frequency, but even that can be tricky to properly interpret.

Share this post


Link to post
Share on other sites
3 hours ago, lowerFE said:

I respect Harman's research as well. Very thorough and well executed. However, there is a weakness. The research, and all of its data is based on existing speakers back in the day. Essentially, it is just simply finding out the most preferred speaker that has already been made. It makes no attempt in finding the theoretical optimal sound signature. Therefore, if there is a sound signature that is preferred over anything else existing on the market, but does not currently exist in a speaker due to technological limitations, then it would not be found. A subjectively flat response is exactly curves (since it changes) that a passive speaker cannot do. I argue that having equal loudness compensation is half of the equation in the ideal most preferred curve. The other half is full range constant directivity down to the schroeder frequency. 

I agree that the fact that the research was based on pre-existing speakers is a weakness.

However, there is another detail that must be taken into consideration.  Almost all music is produced and is subject to optimization as part of the production process.  The monitors that are used in a professional setting are very likely to exhibit a flat response or something close to it on-axis, when measured in an anechoic chamber.  Off-axis response on pro-monitors likely varies a bit more, but production rooms are often treated to remove early reflections and thus reduce the contribution of off-axis sound in those environments.

The crucial objectives of this optimization are: (1) To ensure audibility of each of the different parts of the music (with priority given to certain tracks like, say, the vocals and percussion) and (2) To ensure a subjectively flat frequency response is heard, when played at a lively reference level (i.e. ~83 dBC for chorus or forte passages) on as wide a variety of playback systems as possible.  The anechoic flat monitor, placed far away from walls, is the standard or reference to which other systems are assumed to attempt to conform to.

The real value of the Harman research was to demonstrate that the sound we hear in the mid and high frequencies depends mostly on the sound of the speaker itself, albeit on and off axis.  The rest follows logically by considering the production process as described above.

I have no doubt that equal loudness curve (ELC) compensation can improve the listening experience when listening below the reference level, as is often the case for more casual listening.  However, that is completely separate from the response of the speaker itself.  When the content is produced and optimized, it is monitored at the reference level so that ELC issues don't come into play.

Likewise, when doing any kind of critical listening, whether to evaluate the quality of a mix or the spectral balance of a speaker, it is prudent to listen to the content at or near the reference level.  This is necessary not just to avoid ELC effects but also because many aspects of listening cannot be corrected by ELC compensation alone.  For example, ELC compensation does not completely compensate for tactile sensation differences.  Another thing is that masking profiles change a bit.  Many speaker problems including resonances are much harder to hear at lower levels.  If cranking up the speaker makes it sound harsh, it probably has significant linear response problems that remained hidden at lower levels.

I disagree that full-range constant directivity is necessary, although it probably won't hurt if it's doable.  In most cases, such a speaker would be completely impractical.  As noted above, the Schroeder frequency in my room is around 70 Hz, and maintaining a 90 degree pattern down to 70 Hz would be ludicrous.  Only a full-range omni speaker could do it, but that's hard to realize properly in the high frequencies, even with rear firing tweeters and the like.

What Harman *did* identify as essential in a speaker with good sound is that dispersion should be constant or widening with decreasing frequency.  Ideally, dispersion shouldn't ever narrow with decreasing frequency, except perhaps briefly in the vicinity of crossovers.  Note that many speakers are purposely designed to have flat power response *instead of* flat on-axis response.  Needless to say, they are invariably thin sounding and lacking in bass.

3 hours ago, lowerFE said:

Regarding whether the curve should be statically applied based on a volume level versus the instantaneous RMS level of the signal, I think both have merits. Currently I am on the former, but I'm thinking of moving to the latter. The reason is because I think a static equal loudness compensation based on a volume level is most optimal when the reference volume is known, such as in surround sound formats. This is the closest to real life since the frequency response of individual sound sources do not change depending on the volume.

It should not be an either/or thing.  Both are required to do optimal correction.  Audyssey Dynamic EQ relies on both.  The ELCs ultimately relate loudness to SPL at various frequencies.  Consider the following simplified example.  The curves could be computed via a pair of functions:

    spl_to_loud(f, SPL) => loudness

    loud_to_spl(f, loudness) => SPL

So to compute the compensation for some frequency, f, you'd do the following:

    ref_level_loudness = spl_to_loud(f, signal_level_dbfs + fullscale_ref_spl)

    playback_spl = loud_to_spl(f, ref_level_loudness - (fullscale_ref_spl - fullscale_playback_spl))

The fullscale_ref_spl would be the SPL of a fullscale signal when played with the system at the reference level.  The signal_level_dbfs is the level of the signal, relative to a fullscale signal.  The fullscale_playback_level is the SPL of a fullscale signal when played with the system at your chosen master volume setting.  Note that in this example, the mater volume essentially adjusts loudness directly instead of SPL.

4 hours ago, lowerFE said:

However, when we don't know the reference volume, I'm inclined to use a dynamically applied compensation. This is because for most music, we simply don't know what the reference volume is. The average level of songs can vary quite widely depending on content. Also, for most music, especially for any music that contains electronic instruments, there is no "reference volume" anyways. We have no idea what "real" really sounds like. Therefore, there is little purpose in trying to achieve "realism", and the goal should be "good sounding". When the reference volume can differ rather dramatically between song to song and genre to genre, a dynamically applied compensation will have a better chance of sounding correct since it is based on the actual SPL heard at the listening position. The obvious exceptions are classical music, or any music with large dynamic swings since they should be left as it is instead of being modified, but I hardly listen to music with real instruments.

No, we don't know the reference volume, but that's a pretty minor problem.  For one thing, the range of typical of reference levels is pretty narrow.  Probably 80-90% of music falls within a range of about 5 dB, which is pretty small.  Audyssey Dynamic EQ offers reference offsets of "0", "5", "10", and "15", and the difference between adjacent offsets each is quite subtle.  The vast majority of music works well with "10" or "15".  If "calibrated" to cinema reference level using a standard pink noise signal, the reference level for the vast majority of music when played in a typical residential space will fall in the "-8" to "-17" range, with most recent loudness war stuff being "-14" or louder.

It's also not too hard to roughly ascertain the reference level of the music by simply turning it up to a lively level.  Depending on the spectral balance of the playback system, some variation will be observed for some tracks with sometimes minor changes in spectral balance leading to big changes in apparent loudness, but for the most part, subjective appraisal works fine.

To be clear, the goal of all of this is to achieve accurate, not realistic reproduction.  Almost all music releases including live acoustic recordings are altered from the raw recordings.  The engineers, often in consultation with the artists, put a lot of effort into making great sounding music.  For many albums such as live acoustic recording, realism may be the intent, but even these recordings are usually altered to make them sound better than the raw recordings.

By reproducing the music accurately, you have the best chance of hearing what the artists intended.  If they relied on a high quality monitoring system to make their judgments and your system is also very accurate, then you will likely hear something very close to what the artists heard, especially if you play it at its reference level.  Again, the point of the reference level in music is not to achieve a realistic reproduction volume.  Even for live performance, realistic volume depends entirely on how close you are to the band or orchestra.  Are you sitting front row or in the balcony?  That's not the point.  The purpose of reference level is to hear the music where your ears are working at their best and to ensure correct tonal balance / subjective frequency response without needing any compensation for ELCs.  Hence, reference level for a music track is approximately the same (i.e. ~83 dBC for chorus or forte passages) regardless of the type of music.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×