Thread: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM
We generally only expect amvacuumcleanup() routines to be called during VACUUM. But some ginvacuumcleanup() calls are an exception to that general rule -- these are calls made during autoanalyze, where ginvacuumcleanup() does real pending list cleanup work (not just a no-op return). That'll only happen within an autoanalyze, and only when no VACUUM took place before the ANALYZE. The high level goal is to make sure that the av worker process won't neglect to call ginvacuumcleanup() for pending list cleanup, even when there was no VACUUM. This behavior was added when the GIN fastupdate/pending list stuff was first introduced, in commit ff301d6e69. The design of ANALYZE differs from the design of VACUUM in that only ANALYZE will allocate an XID (typically in a call to update_attstats()). ANALYZE can also hold an MVCC snapshot. This is why ANALYZE holds back cleanup by VACUUM in another process, which sometimes causes problems (say during pgbench) -- this much is fairly well known. But there is also a pretty nasty interaction between this aspect of ANALYZE, and the special GIN pending list cleanup path I mentioned. This interaction makes the VACUUM-OldestXmin-held-back situation far worse. The special analyze_only ginvacuumcleanup() calls happen fairly late during the ANALYZE, during the window that ANALYZE holds back OldestXmin values in other VACUUMs. This greatly increases the extent of the problem, in the obvious way. GIN index pending list cleanup will often take a great deal longer than the typical ANALYZE tasks take -- it's a pretty resource intensive maintenance operation. Especially if there are a couple of GIN indexes on the table. This issue was brought to my attention by Nikolay Samokhvalov. He reached out privately about it. He mentioned one problematic case involving an ANALYZE lasting 45 minutes, or longer (per log_autovacuum_min_duration output for the autoanalyze). That was correlated with VACUUMs on other tables whose OldestXmin values were all held back to the same old XID. I think that this issue ought to be treated as a bug. Jaime Casanova wrote a patch that does pending list cleanup using the AV worker item infrastructure [1]. It's in the CF queue. Sounds like a good idea to me. The goal of that patch is to take work out of the insert path, when our gin_pending_list_limit-based limit is hit, but offhand I imagine that the same approach could be used as a fix for this issue, at least on HEAD. [1] https://postgr.es/m/20210405063117.GA2478@ahch-to -- Peter Geoghegan
On Thu, Oct 7, 2021 at 10:34 PM Peter Geoghegan <pg@bowt.ie> wrote: > This issue was brought to my attention by Nikolay Samokhvalov. He > reached out privately about it. He mentioned one problematic case > involving an ANALYZE lasting 45 minutes, or longer (per > log_autovacuum_min_duration output for the autoanalyze). That was > correlated with VACUUMs on other tables whose OldestXmin values were > all held back to the same old XID. I think that this issue ought to be > treated as a bug. It's hard to think of a non-invasive bug fix. The obvious approach is to move the index_vacuum_cleanup()/ginvacuumcleanup() calls in analyze.c to some earlier point in ANALYZE, in order to avoid doing lots of VACUUM-ish work while we hold an MVCC snapshot that blocks cleanup in other tables. The code in question is more or less supposed to be run during VACUUM already, and so the idea of moving it back to when the autoanalyze worker backend state "still looks like the usual autovacuum case" makes a certain amount of sense. But that doesn't work, at least not without lots of invasive changes. While I'm no closer to a backpatchable fix than I was on Thursday, I do have some more ideas about what to do on HEAD. I now lean towards completely ripping analyze_only calls out, there -- the whole idea of calling amvacuumcleanup() routines during autoanalyze (but not plain ANALYZE) seems bolted on. It's not just the risk of similar problems cropping up in the future -- it's that the whole approach seems obsolete. We now generally expect autovacuum to run against insert-only tables. That might not be a perfect fit for this, but it still seems far better. Does anyone have any ideas for a targeted fix? Here's why the "obvious" fix is impractical, at least for backpatch: To recap, a backend running VACUUM is generally able to avoid the need to be considered inside GetOldestNonRemovableTransactionId(), which is practically essential for any busy database -- without that, long running VACUUM operations would behave like conventional long running transactions, causing all sorts of chaos. The problem here is that we can have ginvacuumcleanup() calls that take place without avoiding the same kind of chaos, just because they happen to take place during autoanalyze. It seems like the whole GIN autoanalyze mechanism design was based on the assumption that it didn't make much difference *when* we reach ginvacuumcleanup(), as long as it happened regularly. But that's just not true. We go to a lot of trouble to make VACUUM have this property. This cannot easily be extended or generalized to cover this special case during ANALYZE. For one thing, the high level vacuum_rel() entry point sets things up carefully, using session-level locks for relations. For another, it relies on special PROC_IN_VACUUM flag logic -- that status is stored in MyProc->statusFlags. -- Peter Geoghegan
On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote: > While I'm no closer to a backpatchable fix than I was on Thursday, I > do have some more ideas about what to do on HEAD. I now lean towards > completely ripping analyze_only calls out, there -- the whole idea of > calling amvacuumcleanup() routines during autoanalyze (but not plain > ANALYZE) seems bolted on. It's not just the risk of similar problems > cropping up in the future -- it's that the whole approach seems > obsolete. We now generally expect autovacuum to run against > insert-only tables. Attached patch removes calls to each index's amvacuumcleanup() routine that take place during ANALYZE. These days we can just rely on autovacuum to run against insert-only tables (assuming the user didn't go out of their way to disable that behavior). Having thought about it some more, I have arrived at the conclusion that we should backpatch this to Postgres 13, the first version that had insert-driven autovacuums (following commit b07642db). This approach is unorthodox, because it amounts to disabling a theoretically-working feature in the backbranches. Also, I'd be drawing the line at Postgres 13, due only to the quite accidental fact that that's the first major release that clearly doesn't need this mechanism. (As it happens Nikolay was on 12 anyway, so this won't work for him, but he already has a workaround IIUC.) I reached this conclusion because I can't think of a non-invasive fix, and I really don't want to go there. At the same time, this behavior is barely documented, and is potentially very harmful indeed. I'm sure that we should get rid of it on HEAD, but getting rid of it a couple of years earlier seems prudent. Does anybody have any opinion on this, either in favor or against my backpatch-to-13 proposal? Although this is technically the first problem report about this since the GIN fastupdate stuff was introduced over a decade ago, I highly doubt that that tells us much, given the specifics. We only added instrumentation to autovacuum that showed each VACUUM's OldestXmin in Postgres 10 -- that's relatively recent. Nikolay is as sophisticated a Postgres user as anybody, and it was only through sheer luck that we managed to figure this out -- he had access to that OldestXmin instrumentation, and also had access to my input on it. While the issue itself was very hard to spot, the negative ramifications certainly were not. Many users bend over backwards to avoid long running transactions, and the fact that there is this highly obscure path in which autoanalyze creates very long running transactions carelessly is pretty distressing to me. I remember hearing complaints about how slow GIN pending list cleanup by VACUUM was years ago, back in my consulting days. When the feature was relatively new. I just accepted the general wisdom at the time, which is that the mechanism itself is slow. But I now suspect that that issue has far more to do with holding back VACUUM/other cleanup generally, and not with the efficiency of GIN itself. -- Peter Geoghegan
Attachment
On Thu, Oct 14, 2021 at 7:58 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote: > > While I'm no closer to a backpatchable fix than I was on Thursday, I > > do have some more ideas about what to do on HEAD. I now lean towards > > completely ripping analyze_only calls out, there -- the whole idea of > > calling amvacuumcleanup() routines during autoanalyze (but not plain > > ANALYZE) seems bolted on. It's not just the risk of similar problems > > cropping up in the future -- it's that the whole approach seems > > obsolete. We now generally expect autovacuum to run against > > insert-only tables. > > Attached patch removes calls to each index's amvacuumcleanup() routine > that take place during ANALYZE. These days we can just rely on > autovacuum to run against insert-only tables (assuming the user didn't > go out of their way to disable that behavior). Looking at the original commit, as you mentioned, ISTM performing pending list cleanup during (auto)analyze (and analyze_only) was introduced to perform the pending list cleanup on insert-only tables. Now that we have autovacuum_vacuum_insert_threshold, we don’t necessarily need to rely on that. On the other hand, I still see a little value in performing the pending list cleanup during autoanalyze. For example, if the user wants to clean up the pending list frequently in the background (so that it's not triggered in the INSERT path), it might be better to do that during autoanalyze than autovacuum. If the table has garbage, autovacuum has to vacuum all indexes and the table, taking a very long time. But autoanalyze can be done in a shorter time. If we trigger autoanalyze frequently and perform pending list cleanup, the pending list cleanup can also be done in a relatively short time, preventing MVCC snapshots from being held for a long time. Therefore, I personally think that it's better to eliminate analyze_only code after introducing a way that allows us to perform the pending list cleanup more frequently. I think that the idea of Jaime Casanova's patch is a good solution. > > Having thought about it some more, I have arrived at the conclusion > that we should backpatch this to Postgres 13, the first version that > had insert-driven autovacuums (following commit b07642db). This > approach is unorthodox, because it amounts to disabling a > theoretically-working feature in the backbranches. Also, I'd be > drawing the line at Postgres 13, due only to the quite accidental fact > that that's the first major release that clearly doesn't need this > mechanism. (As it happens Nikolay was on 12 anyway, so this won't work > for him, but he already has a workaround IIUC.) > > I reached this conclusion because I can't think of a non-invasive fix, > and I really don't want to go there. At the same time, this behavior > is barely documented, and is potentially very harmful indeed. I'm sure > that we should get rid of it on HEAD, but getting rid of it a couple > of years earlier seems prudent. > > Does anybody have any opinion on this, either in favor or against my > backpatch-to-13 proposal? I'm not very positive about back-patching. The first reason is what I described above; I still see little value in performing pending list cleanup during autoanalyze. Another reason is that if the user relies on autoanalyze to perform pending list cleanup, they have to enable autovacuum_vacuum_insert_threshold instead during the minor upgrade. Since it also means to trigger autovacuum in more cases I think it will have a profound impact on the existing system. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Oct 14, 2021 at 12:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Looking at the original commit, as you mentioned, ISTM performing > pending list cleanup during (auto)analyze (and analyze_only) was > introduced to perform the pending list cleanup on insert-only tables. > Now that we have autovacuum_vacuum_insert_threshold, we don’t > necessarily need to rely on that. Right. > On the other hand, I still see a little value in performing the > pending list cleanup during autoanalyze. For example, if the user > wants to clean up the pending list frequently in the background (so > that it's not triggered in the INSERT path), it might be better to do > that during autoanalyze than autovacuum. If the table has garbage, > autovacuum has to vacuum all indexes and the table, taking a very long > time. But autoanalyze can be done in a shorter time. If we trigger > autoanalyze frequently and perform pending list cleanup, the pending > list cleanup can also be done in a relatively short time, preventing > MVCC snapshots from being held for a long time. I agree that that's true -- there is at least a little value. But, that's just an accident of history. Today, ginvacuumcleanup() won't need to scan the whole index in the autoanalyze path that I'm concerned about - it will just do pending list insertion. This does mean that the autoanalyze path taken within ginvacuumcleanup() should be a lot faster than a similar cleanup-only call to ginvacuumcleanup(). But...is there actually a good reason for that? Why should a cleanup-only VACUUM ANALYZE (i.e. a V-A where the VACUUM does not see any heap-page LP_DEAD items) be so much slower than a similar ANALYZE against the same table, under the same conditions? I see no good reason. Ideally, ginvacuumcleanup() would behave like btvacuumcleanup() and hashvacuumcleanup(). That is, it should not unnecessarily scan the index (even when used by VACUUM). In other words, it should have something like the "Skip full index scan" mechanism that you added to nbtree in commit 857f9c36. That way it just wouldn't make sense to have this autoanalyze path anymore -- it would no longer have this accidental advantage over a regular ginvacuumcleanup() call made from VACUUM. More generally, I think it's a big problem that ginvacuumcleanup() has so many weird special cases. Why does the full_clean argument to ginInsertCleanup() even exist? It makes the behavior inside ginInsertCleanup() vary based on whether we're in autovacuum (or autoanalyze) for no reason at all. I think that the true reason for this is simple: the code in ginInsertCleanup() is *bad*. full_clean was just forgotten about by one of its many bug fixes since the code quality started to go down. Somebody (who shall remain nameless) was just careless when maintaining that code. VACUUM should be in charge of index AMs -- not the other way around. It's good that the amvacuumcleanup() interface is so flexible, but I think that GIN is over relying on this flexibility. Ideally, VACUUM wouldn't have to think about the specific index AMs involved at all -- why should GIN be so different to GiST, nbtree, or hash? If GIN (or any other index AM) behaves like a special little snowflake, with its own unique performance characteristics, then it is harder to improve certain important things inside VACUUM. For example, the conveyor belt index vacuuming design from Robert Haas won't work as well as it could. > Therefore, I personally think that it's better to eliminate > analyze_only code after introducing a way that allows us to perform > the pending list cleanup more frequently. I think that the idea of > Jaime Casanova's patch is a good solution. I now think that it would be better to fix ginvacuumcleanup() in the way that I described (I changed my mind). Jaime's patch does seem like a good idea to me, but not for this. It makes sense to have that, for the reasons that Jaime said himself. > I'm not very positive about back-patching. The first reason is what I > described above; I still see little value in performing pending list > cleanup during autoanalyze. Another reason is that if the user relies > on autoanalyze to perform pending list cleanup, they have to enable > autovacuum_vacuum_insert_threshold instead during the minor upgrade. > Since it also means to trigger autovacuum in more cases I think it > will have a profound impact on the existing system. I have to admit that when I wrote my earlier email, I was still a little shocked by the problem -- which is not a good state of mind when making a decision about backpatching. But, I now think that I underappreciated the risk of making the problem worse in the backbranches, rather than better. I won't be backpatching anything here. The problem report from Nikolay was probably an unusually bad case, where the pending list cleanup/insertion was particularly expensive. This *is* really awful behavior, but that alone isn't a good enough reason to backpatch. -- Peter Geoghegan