From standpoint of no-clue database admin, it's easier to give more resources to Postgres and google what process called "autovacuum" does than to learn why is it being slow on read.
It's also tricky that index only scans depend on working autovacuum, and autovacuum never comes to those tables. Since PG11 it's safe to call vacuum on table with indexes, since index is no longer being scanned in its entirety. I would also propose to include "tuples inserted" into formula for autovacuum threshold the same way it is done for autoanalyze threshold. This will fix the situation where you delete 50 rows in 100-gigabyte table and autovacuum suddenly goes to rewrite and reread hint bits on all of it, since it never touched it before.
I've had to do quite a bit of performance investigation work this year and it seems that I only too often discover that the same problem is repeating itself... A vacuum_cost_limit that is still set to the 200 default value along with all 3 auto-vacuum workers being flat chat trying and failing to keep up with the demand.
I understand we often keep the default config aimed at low-end servers, but I don't believe we should categorise this option the same way as we do with shared_buffers and work_mem. What's to say that having an auto-vacuum that runs too slowly is better than one that runs too quickly?
I have in mind that performance problems arising from having auto-vacuum run too quickly might be easier to diagnose and fix than the ones that arise from it running too slowly. Certainly, the aftermath cleanup involved with it running too slowly is quite a bit more tricky to solve.
Ideally, we'd have something smarter than the cost limits we have today, something that perhaps is adaptive and can make more use of an idle server than we do now, but that sounds like a pretty large project to consider having it working this late in the cycle.
In the meantime, should we consider not having vacuum_cost_limit set so low by default?
I have in mind something in the ballpark of a 5x to 10x increase. It seems the standard settings only allow for a maximum of ~3.9MB/s dirty rate and ~7.8MB/s shared buffer miss rate. That seems pretty slow even for the micro SD card that's in my 4-year-old phone. I think we should be aiming for setting this to something good for the slightly better than average case of modern hardware.
The current default vacuum_cost_limit of 200 seems to be 15 years old and was added in f425b605f4e.