On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote:
> While I'm no closer to a backpatchable fix than I was on Thursday, I
> do have some more ideas about what to do on HEAD. I now lean towards
> completely ripping analyze_only calls out, there -- the whole idea of
> calling amvacuumcleanup() routines during autoanalyze (but not plain
> ANALYZE) seems bolted on. It's not just the risk of similar problems
> cropping up in the future -- it's that the whole approach seems
> obsolete. We now generally expect autovacuum to run against
> insert-only tables.
Attached patch removes calls to each index's amvacuumcleanup() routine
that take place during ANALYZE. These days we can just rely on
autovacuum to run against insert-only tables (assuming the user didn't
go out of their way to disable that behavior).
Having thought about it some more, I have arrived at the conclusion
that we should backpatch this to Postgres 13, the first version that
had insert-driven autovacuums (following commit b07642db). This
approach is unorthodox, because it amounts to disabling a
theoretically-working feature in the backbranches. Also, I'd be
drawing the line at Postgres 13, due only to the quite accidental fact
that that's the first major release that clearly doesn't need this
mechanism. (As it happens Nikolay was on 12 anyway, so this won't work
for him, but he already has a workaround IIUC.)
I reached this conclusion because I can't think of a non-invasive fix,
and I really don't want to go there. At the same time, this behavior
is barely documented, and is potentially very harmful indeed. I'm sure
that we should get rid of it on HEAD, but getting rid of it a couple
of years earlier seems prudent.
Does anybody have any opinion on this, either in favor or against my
backpatch-to-13 proposal?
Although this is technically the first problem report about this since
the GIN fastupdate stuff was introduced over a decade ago, I highly
doubt that that tells us much, given the specifics. We only added
instrumentation to autovacuum that showed each VACUUM's OldestXmin in
Postgres 10 -- that's relatively recent. Nikolay is as sophisticated a
Postgres user as anybody, and it was only through sheer luck that we
managed to figure this out -- he had access to that OldestXmin
instrumentation, and also had access to my input on it. While the
issue itself was very hard to spot, the negative ramifications
certainly were not.
Many users bend over backwards to avoid long running transactions, and
the fact that there is this highly obscure path in which autoanalyze
creates very long running transactions carelessly is pretty
distressing to me. I remember hearing complaints about how slow GIN
pending list cleanup by VACUUM was years ago, back in my consulting
days. When the feature was relatively new. I just accepted the general
wisdom at the time, which is that the mechanism itself is slow. But I
now suspect that that issue has far more to do with holding back
VACUUM/other cleanup generally, and not with the efficiency of GIN
itself.
--
Peter Geoghegan