Thread: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

From

Peter Geoghegan

Date:

08 October 2021, 05:34:31

We generally only expect amvacuumcleanup() routines to be called
during VACUUM. But some ginvacuumcleanup() calls are an exception to
that general rule -- these are calls made during autoanalyze, where
ginvacuumcleanup() does real pending list cleanup work (not just a
no-op return). That'll only happen within an autoanalyze, and only
when no VACUUM took place before the ANALYZE. The high level goal is
to make sure that the av worker process won't neglect to call
ginvacuumcleanup() for pending list cleanup, even when there was no
VACUUM. This behavior was added when the GIN fastupdate/pending list
stuff was first introduced, in commit ff301d6e69.

The design of ANALYZE differs from the design of VACUUM in that only
ANALYZE will allocate an XID (typically in a call to
update_attstats()). ANALYZE can also hold an MVCC snapshot. This is
why ANALYZE holds back cleanup by VACUUM in another process, which
sometimes causes problems (say during pgbench) -- this much is fairly
well known. But there is also a pretty nasty interaction between this
aspect of ANALYZE, and the special GIN pending list cleanup path I
mentioned. This interaction makes the VACUUM-OldestXmin-held-back
situation far worse. The special analyze_only ginvacuumcleanup() calls
happen fairly late during the ANALYZE, during the window that ANALYZE
holds back OldestXmin values in other VACUUMs. This greatly increases
the extent of the problem, in the obvious way. GIN index pending list
cleanup will often take a great deal longer than the typical ANALYZE
tasks take -- it's a pretty resource intensive maintenance operation.
Especially if there are a couple of GIN indexes on the table.

This issue was brought to my attention by Nikolay Samokhvalov. He
reached out privately about it. He mentioned one problematic case
involving an ANALYZE lasting 45 minutes, or longer (per
log_autovacuum_min_duration output for the autoanalyze). That was
correlated with VACUUMs on other tables whose OldestXmin values were
all held back to the same old XID. I think that this issue ought to be
treated as a bug.

Jaime Casanova wrote a patch that does pending list cleanup using the
AV worker item infrastructure [1]. It's in the CF queue. Sounds like a
good idea to me. The goal of that patch is to take work out of the
insert path, when our gin_pending_list_limit-based limit is hit, but
offhand I imagine that the same approach could be used as a fix for
this issue, at least on HEAD.

[1] https://postgr.es/m/20210405063117.GA2478@ahch-to
-- 
Peter Geoghegan

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

From

Peter Geoghegan

Date:

10 October 2021, 00:51:32

On Thu, Oct 7, 2021 at 10:34 PM Peter Geoghegan <pg@bowt.ie> wrote:
> This issue was brought to my attention by Nikolay Samokhvalov. He
> reached out privately about it. He mentioned one problematic case
> involving an ANALYZE lasting 45 minutes, or longer (per
> log_autovacuum_min_duration output for the autoanalyze). That was
> correlated with VACUUMs on other tables whose OldestXmin values were
> all held back to the same old XID. I think that this issue ought to be
> treated as a bug.

It's hard to think of a non-invasive bug fix. The obvious approach is
to move the index_vacuum_cleanup()/ginvacuumcleanup() calls in
analyze.c to some earlier point in ANALYZE, in order to avoid doing
lots of VACUUM-ish work while we hold an MVCC snapshot that blocks
cleanup in other tables. The code in question is more or less supposed
to be run during VACUUM already, and so the idea of moving it back to
when the autoanalyze worker backend state "still looks like the usual
autovacuum case" makes a certain amount of sense. But that doesn't
work, at least not without lots of invasive changes.

While I'm no closer to a backpatchable fix than I was on Thursday, I
do have some more ideas about what to do on HEAD. I now lean towards
completely ripping analyze_only calls out, there -- the whole idea of
calling amvacuumcleanup() routines during autoanalyze (but not plain
ANALYZE) seems bolted on. It's not just the risk of similar problems
cropping up in the future -- it's that the whole approach seems
obsolete. We now generally expect autovacuum to run against
insert-only tables. That might not be a perfect fit for this, but it
still seems far better.

Does anyone have any ideas for a targeted fix?

Here's why the "obvious" fix is impractical, at least for backpatch:

To recap, a backend running VACUUM is generally able to avoid the need
to be considered inside GetOldestNonRemovableTransactionId(), which is
practically essential for any busy database -- without that, long
running VACUUM operations would behave like conventional long running
transactions, causing all sorts of chaos. The problem here is that we
can have ginvacuumcleanup() calls that take place without avoiding the
same kind of chaos, just because they happen to take place during
autoanalyze. It seems like the whole GIN autoanalyze mechanism design
was based on the assumption that it didn't make much difference *when*
we reach ginvacuumcleanup(), as long as it happened regularly. But
that's just not true.

We go to a lot of trouble to make VACUUM have this property. This
cannot easily be extended or generalized to cover this special case
during ANALYZE. For one thing, the high level vacuum_rel() entry point
sets things up carefully, using session-level locks for relations. For
another, it relies on special PROC_IN_VACUUM flag logic -- that status
is stored in MyProc->statusFlags.

--
Peter Geoghegan

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

From

Peter Geoghegan

Date:

13 October 2021, 22:58:32

On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote:
> While I'm no closer to a backpatchable fix than I was on Thursday, I
> do have some more ideas about what to do on HEAD. I now lean towards
> completely ripping analyze_only calls out, there -- the whole idea of
> calling amvacuumcleanup() routines during autoanalyze (but not plain
> ANALYZE) seems bolted on. It's not just the risk of similar problems
> cropping up in the future -- it's that the whole approach seems
> obsolete. We now generally expect autovacuum to run against
> insert-only tables.

Attached patch removes calls to each index's amvacuumcleanup() routine
that take place during ANALYZE. These days we can just rely on
autovacuum to run against insert-only tables (assuming the user didn't
go out of their way to disable that behavior).

Having thought about it some more, I have arrived at the conclusion
that we should backpatch this to Postgres 13, the first version that
had insert-driven autovacuums (following commit b07642db). This
approach is unorthodox, because it amounts to disabling a
theoretically-working feature in the backbranches. Also, I'd be
drawing the line at Postgres 13, due only to the quite accidental fact
that that's the first major release that clearly doesn't need this
mechanism. (As it happens Nikolay was on 12 anyway, so this won't work
for him, but he already has a workaround IIUC.)

I reached this conclusion because I can't think of a non-invasive fix,
and I really don't want to go there. At the same time, this behavior
is barely documented, and is potentially very harmful indeed. I'm sure
that we should get rid of it on HEAD, but getting rid of it a couple
of years earlier seems prudent.

Does anybody have any opinion on this, either in favor or against my
backpatch-to-13 proposal?

Although this is technically the first problem report about this since
the GIN fastupdate stuff was introduced over a decade ago, I highly
doubt that that tells us much, given the specifics. We only added
instrumentation to autovacuum that showed each VACUUM's OldestXmin in
Postgres 10 -- that's relatively recent. Nikolay is as sophisticated a
Postgres user as anybody, and it was only through sheer luck that we
managed to figure this out -- he had access to that OldestXmin
instrumentation, and also had access to my input on it. While the
issue itself was very hard to spot, the negative ramifications
certainly were not.

Many users bend over backwards to avoid long running transactions, and
the fact that there is this highly obscure path in which autoanalyze
creates very long running transactions carelessly is pretty
distressing to me. I remember hearing complaints about how slow GIN
pending list cleanup by VACUUM was years ago, back in my consulting
days. When the feature was relatively new. I just accepted the general
wisdom at the time, which is that the mechanism itself is slow. But I
now suspect that that issue has far more to do with holding back
VACUUM/other cleanup generally, and not with the efficiency of GIN
itself.

-- 
Peter Geoghegan

Attachment

v1-0001-Desupport-analyze_only-calls-to-amvacuumcleanup.patch

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

From

Masahiko Sawada

Date:

14 October 2021, 07:49:30

On Thu, Oct 14, 2021 at 7:58 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > While I'm no closer to a backpatchable fix than I was on Thursday, I
> > do have some more ideas about what to do on HEAD. I now lean towards
> > completely ripping analyze_only calls out, there -- the whole idea of
> > calling amvacuumcleanup() routines during autoanalyze (but not plain
> > ANALYZE) seems bolted on. It's not just the risk of similar problems
> > cropping up in the future -- it's that the whole approach seems
> > obsolete. We now generally expect autovacuum to run against
> > insert-only tables.
>
> Attached patch removes calls to each index's amvacuumcleanup() routine
> that take place during ANALYZE. These days we can just rely on
> autovacuum to run against insert-only tables (assuming the user didn't
> go out of their way to disable that behavior).

Looking at the original commit, as you mentioned, ISTM performing
pending list cleanup during (auto)analyze (and analyze_only) was
introduced to perform the pending list cleanup on insert-only tables.
Now that we have autovacuum_vacuum_insert_threshold, we don’t
necessarily need to rely on that.

On the other hand, I still see a little value in performing the
pending list cleanup during autoanalyze. For example, if the user
wants to clean up the pending list frequently in the background (so
that it's not triggered in the INSERT path), it might be better to do
that during autoanalyze than autovacuum. If the table has garbage,
autovacuum has to vacuum all indexes and the table, taking a very long
time. But autoanalyze can be done in a shorter time. If we trigger
autoanalyze frequently and perform pending list cleanup, the pending
list cleanup can also be done in a relatively short time, preventing
MVCC snapshots from being held for a long time.

Therefore, I personally think that it's better to eliminate
analyze_only code after introducing a way that allows us to perform
the pending list cleanup more frequently. I think that the idea of
Jaime Casanova's patch is a good solution.

>
> Having thought about it some more, I have arrived at the conclusion
> that we should backpatch this to Postgres 13, the first version that
> had insert-driven autovacuums (following commit b07642db). This
> approach is unorthodox, because it amounts to disabling a
> theoretically-working feature in the backbranches. Also, I'd be
> drawing the line at Postgres 13, due only to the quite accidental fact
> that that's the first major release that clearly doesn't need this
> mechanism. (As it happens Nikolay was on 12 anyway, so this won't work
> for him, but he already has a workaround IIUC.)
>
> I reached this conclusion because I can't think of a non-invasive fix,
> and I really don't want to go there. At the same time, this behavior
> is barely documented, and is potentially very harmful indeed. I'm sure
> that we should get rid of it on HEAD, but getting rid of it a couple
> of years earlier seems prudent.
>
> Does anybody have any opinion on this, either in favor or against my
> backpatch-to-13 proposal?

I'm not very positive about back-patching. The first reason is what I
described above; I still see little value in performing pending list
cleanup during autoanalyze. Another reason is that if the user relies
on autoanalyze to perform pending list cleanup, they have to enable
autovacuum_vacuum_insert_threshold instead during the minor upgrade.
Since it also means to trigger autovacuum in more cases I think it
will have a profound impact on the existing system.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

From

Peter Geoghegan

Date:

17 October 2021, 02:33:57

On Thu, Oct 14, 2021 at 12:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Looking at the original commit, as you mentioned, ISTM performing
> pending list cleanup during (auto)analyze (and analyze_only) was
> introduced to perform the pending list cleanup on insert-only tables.
> Now that we have autovacuum_vacuum_insert_threshold, we don’t
> necessarily need to rely on that.

Right.

> On the other hand, I still see a little value in performing the
> pending list cleanup during autoanalyze. For example, if the user
> wants to clean up the pending list frequently in the background (so
> that it's not triggered in the INSERT path), it might be better to do
> that during autoanalyze than autovacuum. If the table has garbage,
> autovacuum has to vacuum all indexes and the table, taking a very long
> time. But autoanalyze can be done in a shorter time. If we trigger
> autoanalyze frequently and perform pending list cleanup, the pending
> list cleanup can also be done in a relatively short time, preventing
> MVCC snapshots from being held for a long time.

I agree that that's true -- there is at least a little value. But,
that's just an accident of history.

Today, ginvacuumcleanup() won't need to scan the whole index in the
autoanalyze path that I'm concerned about - it will just do pending
list insertion. This does mean that the autoanalyze path taken within
ginvacuumcleanup() should be a lot faster than a similar cleanup-only
call to ginvacuumcleanup(). But...is there actually a good reason for
that? Why should a cleanup-only VACUUM ANALYZE (i.e. a V-A where the
VACUUM does not see any heap-page LP_DEAD items) be so much slower
than a similar ANALYZE against the same table, under the same
conditions? I see no good reason.

Ideally, ginvacuumcleanup() would behave like btvacuumcleanup() and
hashvacuumcleanup(). That is, it should not unnecessarily scan the
index (even when used by VACUUM). In other words, it should have
something like the "Skip full index scan" mechanism that you added to
nbtree in commit 857f9c36. That way it just wouldn't make sense to
have this autoanalyze path anymore -- it would no longer have this
accidental advantage over a regular ginvacuumcleanup() call made from
VACUUM.

More generally, I think it's a big problem that ginvacuumcleanup() has
so many weird special cases. Why does the full_clean argument to
ginInsertCleanup() even exist? It makes the behavior inside
ginInsertCleanup() vary based on whether we're in autovacuum (or
autoanalyze) for no reason at all. I think that the true reason for
this is simple: the code in ginInsertCleanup() is *bad*. full_clean
was just forgotten about by one of its many bug fixes since the code
quality started to go down. Somebody (who shall remain nameless) was
just careless when maintaining that code.

VACUUM should be in charge of index AMs -- not the other way around.
It's good that the amvacuumcleanup() interface is so flexible, but I
think that GIN is over relying on this flexibility. Ideally, VACUUM
wouldn't have to think about the specific index AMs involved at all --
why should GIN be so different to GiST, nbtree, or hash? If GIN (or
any other index AM) behaves like a special little snowflake, with its
own unique performance characteristics, then it is harder to improve
certain important things inside VACUUM. For example, the conveyor belt
index vacuuming design from Robert Haas won't work as well as it
could.

> Therefore, I personally think that it's better to eliminate
> analyze_only code after introducing a way that allows us to perform
> the pending list cleanup more frequently. I think that the idea of
> Jaime Casanova's patch is a good solution.

I now think that it would be better to fix ginvacuumcleanup() in the
way that I described (I changed my mind). Jaime's patch does seem like
a good idea to me, but not for this. It makes sense to have that, for
the reasons that Jaime said himself.

> I'm not very positive about back-patching. The first reason is what I
> described above; I still see little value in performing pending list
> cleanup during autoanalyze. Another reason is that if the user relies
> on autoanalyze to perform pending list cleanup, they have to enable
> autovacuum_vacuum_insert_threshold instead during the minor upgrade.
> Since it also means to trigger autovacuum in more cases I think it
> will have a profound impact on the existing system.

I have to admit that when I wrote my earlier email, I was still a
little shocked by the problem -- which is not a good state of mind
when making a decision about backpatching. But, I now think that I
underappreciated the risk of making the problem worse in the
backbranches, rather than better. I won't be backpatching anything
here.

The problem report from Nikolay was probably an unusually bad case,
where the pending list cleanup/insertion was particularly expensive.
This *is* really awful behavior, but that alone isn't a good enough
reason to backpatch.

--
Peter Geoghegan