Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM |
Date | |
Msg-id | CAH2-WzmaPQ64qPeO2mUQiu0HEEgBELK=3H8+Qhiee69T4UBrzg@mail.gmail.com Whole thread Raw |
In response to | GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM
|
List | pgsql-hackers |
On Thu, Oct 7, 2021 at 10:34 PM Peter Geoghegan <pg@bowt.ie> wrote: > This issue was brought to my attention by Nikolay Samokhvalov. He > reached out privately about it. He mentioned one problematic case > involving an ANALYZE lasting 45 minutes, or longer (per > log_autovacuum_min_duration output for the autoanalyze). That was > correlated with VACUUMs on other tables whose OldestXmin values were > all held back to the same old XID. I think that this issue ought to be > treated as a bug. It's hard to think of a non-invasive bug fix. The obvious approach is to move the index_vacuum_cleanup()/ginvacuumcleanup() calls in analyze.c to some earlier point in ANALYZE, in order to avoid doing lots of VACUUM-ish work while we hold an MVCC snapshot that blocks cleanup in other tables. The code in question is more or less supposed to be run during VACUUM already, and so the idea of moving it back to when the autoanalyze worker backend state "still looks like the usual autovacuum case" makes a certain amount of sense. But that doesn't work, at least not without lots of invasive changes. While I'm no closer to a backpatchable fix than I was on Thursday, I do have some more ideas about what to do on HEAD. I now lean towards completely ripping analyze_only calls out, there -- the whole idea of calling amvacuumcleanup() routines during autoanalyze (but not plain ANALYZE) seems bolted on. It's not just the risk of similar problems cropping up in the future -- it's that the whole approach seems obsolete. We now generally expect autovacuum to run against insert-only tables. That might not be a perfect fit for this, but it still seems far better. Does anyone have any ideas for a targeted fix? Here's why the "obvious" fix is impractical, at least for backpatch: To recap, a backend running VACUUM is generally able to avoid the need to be considered inside GetOldestNonRemovableTransactionId(), which is practically essential for any busy database -- without that, long running VACUUM operations would behave like conventional long running transactions, causing all sorts of chaos. The problem here is that we can have ginvacuumcleanup() calls that take place without avoiding the same kind of chaos, just because they happen to take place during autoanalyze. It seems like the whole GIN autoanalyze mechanism design was based on the assumption that it didn't make much difference *when* we reach ginvacuumcleanup(), as long as it happened regularly. But that's just not true. We go to a lot of trouble to make VACUUM have this property. This cannot easily be extended or generalized to cover this special case during ANALYZE. For one thing, the high level vacuum_rel() entry point sets things up carefully, using session-level locks for relations. For another, it relies on special PROC_IN_VACUUM flag logic -- that status is stored in MyProc->statusFlags. -- Peter Geoghegan
pgsql-hackers by date: