Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-WzkD9gX4MAGhmEX=_MbTjDXGr=1zCDSnC=BUdRPywJy64A@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
List | pgsql-hackers |
, On Sun, Feb 20, 2022 at 7:30 AM Robert Haas <robertmhaas@gmail.com> wrote: > Right, so we at least need to add a similar comment to what I proposed > for MXIDs, and maybe other changes are needed, too. Agreed. > > Maybe the way that FreezeLimit/cutoff_xid is overloaded can be fixed > > here, to make all of this less confusing. I only now fully realized > > how confusing all of this stuff is -- very. > > Right. I think I understand all of this, or at least most of it -- but > not from the comment. The question is how the comment can be more > clear. My general suggestion is that function header comments should > have more to do with the behavior of the function than how it fits > into the bigger picture. If it's clear to the reader what conditions > must hold before calling the function and which must hold on return, > it helps a lot. IMHO, it's the job of the comments in the calling > function to clarify why we then choose to call that function at the > place and in the way that we do. You've given me a lot of high quality feedback on all of this, which I'll work through soon. It's hard to get the balance right here, but it's made much easier by this kind of feedback. > I think that the idea has potential, but I don't think that I > understand yet what the *exact* algorithm is. The algorithm seems to exploit a natural tendency that Andres once described in a blog post about his snapshot scalability work [1]. To a surprising extent, we can usefully bucket all tuples/pages into two simple categories: 1. Very, very old ("infinitely old" for all practical purposes). 2. Very very new. There doesn't seem to be much need for a third "in-between" category in practice. This seems to be at least approximately true all of the time. Perhaps Andres wouldn't agree with this very general statement -- he actually said something more specific. I for one believe that the point he made generalizes surprisingly well, though. I have my own theories about why this appears to be true. (Executive summary: power laws are weird, and it seems as if the sparsity-of-effects principle makes it easy to bucket things at the highest level, in a way that generalizes well across disparate workloads.) > Maybe I need to read the > code, when I have some time for that. I can't form an intelligent > opinion at this stage about whether this is likely to be a net > positive. The code in the v8-0002 patch is a bit sloppy right now. I didn't quite get around to cleaning it up -- I was focussed on performance validation of the algorithm itself. So bear that in mind if you do look at v8-0002 (might want to wait for v9-0002 before looking). I believe that the only essential thing about the algorithm itself is that it freezes all the tuples on a page when it anticipates setting the page all-visible, or (barring edge cases) freezes none at all. (Note that setting the page all-visible/all-frozen may be happen just after lazy_scan_prune returns, or in the second pass over the heap, after LP_DEAD items are set to LP_UNUSED -- lazy_scan_prune doesn't care which way it will happen.) There are one or two other design choices that we need to make, like what exact tuples we freeze in the edge case where FreezeLimit/XID age forces us to freeze in lazy_scan_prune. These other design choices don't seem relevant to the issue of central importance, which is whether or not we come out ahead overall with this new algorithm. FreezeLimit will seldom affect our choice to freeze or not freeze now, and so AFAICT the exact way that FreezeLimit affects which precise freezing-eligible tuples we freeze doesn't complicate performance validation. Remember when I got excited about how my big TPC-C benchmark run showed a predictable, tick/tock style pattern across VACUUM operations against the order and order lines table [2]? It seemed very significant to me that the OldestXmin of VACUUM operation n consistently went on to become the new relfrozenxid for the same table in VACUUM operation n + 1. It wasn't exactly the same XID, but very close to it (within the range of noise). This pattern was clearly present, even though VACUUM operation n + 1 might happen as long as 4 or 5 hours after VACUUM operation n (this was a big table). This pattern was encouraging to me because it showed (at least for the workload and tables in question) that the amount of unnecessary extra freezing can't have been too bad -- the fact that we can always advance relfrozenxid in the same way is evidence of that. Note that the vacuum_freeze_min_age setting can't have affected our choice of what to freeze (given what we see in the logs), and yet there is a clear pattern where the pages (it's really pages, not tuples) that the new algorithm doesn't freeze in VACUUM operation n will reliably get frozen in VACUUM operation n + 1 instead. And so this pattern seems to lend support to the general idea of letting the workload itself be the primary driver of what pages we freeze (not FreezeLimit, and not anything based on XIDs). That's really the underlying principle behind the new algorithm -- freezing is driven by workload characteristics (or page/block characteristics, if you prefer). ISTM that vacuum_freeze_min_age is almost impossible to tune -- XID age is just too squishy a concept for that to ever work. [1] https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462#interlude-removing-the-need-for-recentglobalxminhorizon [2] https://postgr.es/m/CAH2-Wz=iLnf+0CsaB37efXCGMRJO1DyJw5HMzm7tp1AxG1NR2g@mail.gmail.com -- scroll down to "TPC-C", which has the relevant autovacuum log output for the orders table, covering a 24 hour period -- Peter Geoghegan
pgsql-hackers by date: