Thanks for creating and posting these samples. Can you explain a bit more about the reasoning that went into creating the particular filter response shape? Is this supposed to be group delay related to some kind of filter (e.g. crossover), an in-room acoustic effect, or something else? FWIW, I definitely notice a difference between the 100 ms and original with a clear reduction of impact. Between the 20 ms and original, I think I notice a very slight reduction in impact, but I'd want to do blinded A/B/X to be confident.
Now having said this, there is a big caveat with your study because there's a big difference between GD applied electronically and GD that arises from acoustic effects. That's why I asked my first question above. Time and time again, engineers try to treat room acoustics as a "linear transform along a wire", when this is not the case at all. The ears and body are capable of sampling pressure at multiple locations (the ears and tactile), and the brain is very well adapted to parsing the content of the source (both time and frequency aspects!) from what could be a very messy sound-field with dramatic local variations in measured frequency response and group delay. So in general, electronic changes may be far far more audible than FR and GD features of similar magnitude that appear in in-room measurements.
Another potential caveat here. You indicate that the frequency response of your filters "is reasonably flat, considered below threshold for audibility". I can't comment with certainty in your specific case, but in general, I would not be surprised if the frequency response changes you show were well above the audibility threshold on a system with strong accurate bass. This alone could have substantially affected the amount of perceived impact. Again, there is a big difference between filters applied electronically and influence of acoustics on measured sound vs. perception. Depending on the circumstances, I believe the brain can pick out excruciatingly small changes, likely below 0.01 dB for bass. These can be perceived most readily on transients.
Regardless of the audibility of your filters, this experiment says little about the audibility of characteristics arising from room acoustics or whether it's necessary to "correct" the group delay deviations seen in in-room measurements. I can't emphasize enough how important it is to keep this distinguish in mind clearly when optimizing response.
Also a comment about the sample material. The kick seemed a bit soft, diffuse, and fluttery to begin with, and it didn't really sound consistent between beats. Audacity spec analysis suggests that the kick has some high Q ringing at various frequencies, which also does not appear to be consistent between different beats. Differences in group delay might be a lot more apparent on tighter transients that don't ring so much.