Thread: Avoiding unnecessary clog lookups while freezing
I took another look at the code coverage situation around freezing following pushing the page-level freezing patch earlier today. I spotted an issue that I'd missed up until now: certain sanity checks in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more often than really seems necessary. Theoretically this is an old issue that dates back to commit 699bf7d05c, as opposed to an issue in the page-level freezing patch. But that's not really true in a real practical sense. In practice the calls to TransactionIdDidCommit() will happen far more frequently following today's commit 1de58df4fe (since we're using OldestXmin as the cutoff that gates performing freeze_xmin/freeze_xmax processing -- not FreezeLimit). No regressions related to clog lookups by VACUUM were apparent from my performance validation of the page-level freezing work, but I suspect that the increase in TransactionIdDidCommit() calls will cause noticeable regressions with the right/wrong workload and/or configuration. The page-level freezing work is expected to reduce clog lookups in VACUUM in general, which is bound to have been a confounding factor. I see no reason why we can't just condition the XID sanity check calls to TransactionIdDidCommit() (for both the freeze_xmin and the freeze_xmax callsites) on a cheap tuple hint bit precheck not being enough. We only need an expensive call to TransactionIdDidCommit() when the precheck doesn't immediately indicate that the tuple xmin looks committed when that's what the sanity check expects to see (or that the tuple's xmax looks aborted, in the case of the callsite where that's what we expect to see). Attached patch shows what I mean. A quick run of the standard regression tests (with custom instrumentation) shows that this patch eliminates 100% of all relevant calls to TransactionIdDidCommit(), for both the freeze_xmin and the freeze_xmax callsites. -- Peter Geoghegan
Attachment
Hi, On 2022-12-28 15:24:28 -0800, Peter Geoghegan wrote: > I took another look at the code coverage situation around freezing > following pushing the page-level freezing patch earlier today. I > spotted an issue that I'd missed up until now: certain sanity checks > in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more > often than really seems necessary. > > Theoretically this is an old issue that dates back to commit > 699bf7d05c, as opposed to an issue in the page-level freezing patch. > But that's not really true in a real practical sense. In practice the > calls to TransactionIdDidCommit() will happen far more frequently > following today's commit 1de58df4fe (since we're using OldestXmin as > the cutoff that gates performing freeze_xmin/freeze_xmax processing -- > not FreezeLimit). Hm. But we still only do the check when we actually freeze, rather than just during the pre-check in heap_tuple_should_freeze(). So we'll only incur the increased overhead when we also do more WAL logging etc. Correct? > I took another look at the code coverage situation around freezing > I see no reason why we can't just condition the XID sanity check calls > to TransactionIdDidCommit() (for both the freeze_xmin and the > freeze_xmax callsites) on a cheap tuple hint bit precheck not being > enough. We only need an expensive call to TransactionIdDidCommit() > when the precheck doesn't immediately indicate that the tuple xmin > looks committed when that's what the sanity check expects to see (or > that the tuple's xmax looks aborted, in the case of the callsite where > that's what we expect to see). Hm. I dimply recall that we had repeated cases where the hint bits were set wrongly due to some of the multixact related bugs. I think I was trying to be paranoid about not freezing stuff in those situations, since it can lead to reviving dead tuples, which obviously is bad. > Attached patch shows what I mean. A quick run of the standard > regression tests (with custom instrumentation) shows that this patch > eliminates 100% of all relevant calls to TransactionIdDidCommit(), for > both the freeze_xmin and the freeze_xmax callsites. There's practically no tuple-level concurrent activity in a normal regression test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more interesting to run tests, including isolationtester and pgbench, against a running cluster. Greetings, Andres Freund
On Wed, Dec 28, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote: > > Theoretically this is an old issue that dates back to commit > > 699bf7d05c, as opposed to an issue in the page-level freezing patch. > > But that's not really true in a real practical sense. In practice the > > calls to TransactionIdDidCommit() will happen far more frequently > > following today's commit 1de58df4fe (since we're using OldestXmin as > > the cutoff that gates performing freeze_xmin/freeze_xmax processing -- > > not FreezeLimit). > > Hm. But we still only do the check when we actually freeze, rather than just > during the pre-check in heap_tuple_should_freeze(). So we'll only incur the > increased overhead when we also do more WAL logging etc. Correct? Yes, that's how it worked up until today's commit 1de58df4fe. I don't have strong feelings about back patching a fix, but this does seem like something that I should fix now, on HEAD. > Hm. I dimply recall that we had repeated cases where the hint bits were set > wrongly due to some of the multixact related bugs. I think I was trying to be > paranoid about not freezing stuff in those situations, since it can lead to > reviving dead tuples, which obviously is bad. I think that it's a reasonable check, and I'm totally in favor of keeping it (or something very close, at least). > There's practically no tuple-level concurrent activity in a normal regression > test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more > interesting to run tests, including isolationtester and pgbench, against a > running cluster. Even on HEAD, with page-level freezing in place, we're only going to test XIDs that are < OldestXmin, that appear on pages tha VACUUM actually scans in the first place. Just checking tuple-level hint bits should be effective. But even if it isn't (for whatever reason) then it's similar to cases where our second heap pass has to do clog lookups in heap_page_is_all_visible() just because hint bits couldn't be set earlier on, back when lazy_scan_prune() processed the same page during VACUUM's initial heap pass. -- Peter Geoghegan
On 2022-12-28 16:37:27 -0800, Peter Geoghegan wrote: > On Wed, Dec 28, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote: > > > Theoretically this is an old issue that dates back to commit > > > 699bf7d05c, as opposed to an issue in the page-level freezing patch. > > > But that's not really true in a real practical sense. In practice the > > > calls to TransactionIdDidCommit() will happen far more frequently > > > following today's commit 1de58df4fe (since we're using OldestXmin as > > > the cutoff that gates performing freeze_xmin/freeze_xmax processing -- > > > not FreezeLimit). > > > > Hm. But we still only do the check when we actually freeze, rather than just > > during the pre-check in heap_tuple_should_freeze(). So we'll only incur the > > increased overhead when we also do more WAL logging etc. Correct? > > Yes, that's how it worked up until today's commit 1de58df4fe. > > I don't have strong feelings about back patching a fix, but this does > seem like something that I should fix now, on HEAD. > > > Hm. I dimply recall that we had repeated cases where the hint bits were set > > wrongly due to some of the multixact related bugs. I think I was trying to be > > paranoid about not freezing stuff in those situations, since it can lead to > > reviving dead tuples, which obviously is bad. > > I think that it's a reasonable check, and I'm totally in favor of > keeping it (or something very close, at least). I don't quite follow - one paragraph up you say we should fix something, and then here you seem to say we should continue not to rely on the hint bits?
On Wed, Dec 28, 2022 at 4:43 PM Andres Freund <andres@anarazel.de> wrote: > > > Hm. I dimply recall that we had repeated cases where the hint bits were set > > > wrongly due to some of the multixact related bugs. I think I was trying to be > > > paranoid about not freezing stuff in those situations, since it can lead to > > > reviving dead tuples, which obviously is bad. > > > > I think that it's a reasonable check, and I'm totally in favor of > > keeping it (or something very close, at least). > > I don't quite follow - one paragraph up you say we should fix something, and > then here you seem to say we should continue not to rely on the hint bits? I didn't mean that we should continue to not rely on the hint bits. Is that really all that the test is for? I think of it as a general sanity check. The important call to avoid with page-level freezing is the xmin call to TransactionIdDidCommit(), not the xmax call. The xmax call only occurs when VACUUM prepares to freeze a tuple that was updated by an updater (not a locker) that aborted. While the xmin calls will now take place with most unfrozen tuples. -- Peter Geoghegan
Hi, On 2022-12-28 22:36:53 -0800, Peter Geoghegan wrote: > On Wed, Dec 28, 2022 at 4:43 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hm. I dimply recall that we had repeated cases where the hint bits were set > > > > wrongly due to some of the multixact related bugs. I think I was trying to be > > > > paranoid about not freezing stuff in those situations, since it can lead to > > > > reviving dead tuples, which obviously is bad. > > > > > > I think that it's a reasonable check, and I'm totally in favor of > > > keeping it (or something very close, at least). > > > > I don't quite follow - one paragraph up you say we should fix something, and > > then here you seem to say we should continue not to rely on the hint bits? > > I didn't mean that we should continue to not rely on the hint bits. Is > that really all that the test is for? I think of it as a general sanity check. I do think we wanted to avoid reviving actually-dead tuples (present due to the multixact and related bugs). And I'm worried about giving that checking up, I've seen it hit too many times. Both in the real world and during development. Somewhat of a tangent: I've previously wondered if we should have a small hash-table based clog cache. The current one-element cache doesn't suffice in a lot of scenarios, but it wouldn't take a huge cache to end up filtering most clog accesses. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Somewhat of a tangent: I've previously wondered if we should have a small > hash-table based clog cache. The current one-element cache doesn't suffice in > a lot of scenarios, but it wouldn't take a huge cache to end up filtering most > clog accesses. I've wondered about that too. The one-element cache was a good hack in its day, but it looks a bit under-engineered by our current standards. (Also, maybe it'd be plausible to have a one-element cache in front of a small hash?) regards, tom lane
On Thu, Dec 29, 2022 at 9:21 AM Andres Freund <andres@anarazel.de> wrote: > I do think we wanted to avoid reviving actually-dead tuples (present due to > the multixact and related bugs). And I'm worried about giving that checking > up, I've seen it hit too many times. Both in the real world and during > development. I could just move the same tests from heap_prepare_freeze_tuple() to heap_freeze_execute_prepared(), without changing any of the details. That would mean that the TransactionIdDidCommit() calls would only take place with tuples that actually get frozen, which is more or less how it worked before now. heap_prepare_freeze_tuple() will now often prepare freeze plans that just get discarded by lazy_scan_prune(). My concern is the impact on tables/pages that almost always discard prepared freeze plans, and so require many TransactionIdDidCommit() calls that really aren't necessary. > Somewhat of a tangent: I've previously wondered if we should have a small > hash-table based clog cache. The current one-element cache doesn't suffice in > a lot of scenarios, but it wouldn't take a huge cache to end up filtering most > clog accesses. I imagine that the one-element cache works alright in some scenarios, but then suddenly doesn't work so well, even though not very much has changed. Behavior like that makes the problems difficult to analyze, and easy to miss. I'm suspicious of that. -- Peter Geoghegan
Hi, On 2022-12-29 09:43:39 -0800, Peter Geoghegan wrote: > On Thu, Dec 29, 2022 at 9:21 AM Andres Freund <andres@anarazel.de> wrote: > > I do think we wanted to avoid reviving actually-dead tuples (present due to > > the multixact and related bugs). And I'm worried about giving that checking > > up, I've seen it hit too many times. Both in the real world and during > > development. > > I could just move the same tests from heap_prepare_freeze_tuple() to > heap_freeze_execute_prepared(), without changing any of the details. That might work, yes. > That would mean that the TransactionIdDidCommit() calls would only > take place with tuples that actually get frozen, which is more or less > how it worked before now. > > heap_prepare_freeze_tuple() will now often prepare freeze plans that > just get discarded by lazy_scan_prune(). My concern is the impact on > tables/pages that almost always discard prepared freeze plans, and so > require many TransactionIdDidCommit() calls that really aren't > necessary. It seems somewhat wrong that we discard all the work that heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in a bunch of important cases (e.g. creating a new multixact), but even so, e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is just creating the freeze plan free. I think the better approach might be to make heap_tuple_should_freeze() more powerful and to only create the freeze plan when actually freezing. I wonder how often it'd be worthwhile to also do opportunistic freezing during lazy_vacuum_heap_page(), given that we already will WAL log (and often issue an FPI). > > Somewhat of a tangent: I've previously wondered if we should have a small > > hash-table based clog cache. The current one-element cache doesn't suffice in > > a lot of scenarios, but it wouldn't take a huge cache to end up filtering most > > clog accesses. > > I imagine that the one-element cache works alright in some scenarios, > but then suddenly doesn't work so well, even though not very much has > changed. Behavior like that makes the problems difficult to analyze, > and easy to miss. I'm suspicious of that. I think there's a lot of situations where it flat out doesn't work - even if you just have an inserting and a deleting transaction, we'll often end up not hitting the 1-element cache due to looking up two different xids in a roughly alternating pattern... Greetings, Andres Freund
On Thu, Dec 29, 2022 at 12:00 PM Andres Freund <andres@anarazel.de> wrote: > > I could just move the same tests from heap_prepare_freeze_tuple() to > > heap_freeze_execute_prepared(), without changing any of the details. > > That might work, yes. Attached patch shows how that could work. > It seems somewhat wrong that we discard all the work that > heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in > a bunch of important cases (e.g. creating a new multixact), but even so, > e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is > just creating the freeze plan free. I'm not sure what you mean by that. I believe that the page-level freezing changes do not allow FreezeMultiXactId() to call GetMultiXactIdMembers() any more often than before. Are you concerned about a regression, or something more general than that? The only case that we *don't* force xmax freezing in FreezeMultiXactId() is the FRM_NOOP case. Note in particular that we will reliably force freezing for any Multi < OldestMxact (not < MultiXactCutoff). > I wonder how often it'd be worthwhile to also do opportunistic freezing during > lazy_vacuum_heap_page(), given that we already will WAL log (and often issue > an FPI). Yeah, we don't actually need a cleanup lock for that. It might also make sense to teach lazy_scan_prune() to anticipate what will happen later on, in lazy_vacuum_heap_page(), so that it can freeze based on the same observation about the cost. (It already does a certain amount of this kind of thing, in fact.) -- Peter Geoghegan
Attachment
On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > > It seems somewhat wrong that we discard all the work that > > heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in > > a bunch of important cases (e.g. creating a new multixact), but even so, > > e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is > > just creating the freeze plan free. > > I'm not sure what you mean by that. I believe that the page-level > freezing changes do not allow FreezeMultiXactId() to call > GetMultiXactIdMembers() any more often than before. Are you concerned > about a regression, or something more general than that? Here's an idea that seems like it could ameliorate the issue: When we're looping through members from GetMultiXactIdMembers(), and seeing if we can get away with !need_replace/FRM_NOOP processing, why not also check if there are any XIDs >= OldestXmin among the members? If not (if they're all < OldestXmin), then we should prefer to go further with processing the Multi now -- FRM_NOOP processing isn't actually cheaper. We'll already know that a second pass over the multi really isn't expensive. And that it will actually result in FRM_INVALIDATE_XMAX processing, which is ideal. Avoiding a second pass is really just about avoiding FRM_RETURN_IS_MULTI (and avoiding FRM_RETURN_IS_XID, perhaps, though to a much lesser degree). -- Peter Geoghegan
On Thu, Dec 29, 2022 at 12:50 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > It seems somewhat wrong that we discard all the work that > > > heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in > > > a bunch of important cases (e.g. creating a new multixact), but even so, > > > e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is > > > just creating the freeze plan free. > Here's an idea that seems like it could ameliorate the issue: > > When we're looping through members from GetMultiXactIdMembers(), and > seeing if we can get away with !need_replace/FRM_NOOP processing, why > not also check if there are any XIDs >= OldestXmin among the members? > If not (if they're all < OldestXmin), then we should prefer to go > further with processing the Multi now -- FRM_NOOP processing isn't > actually cheaper. Attached patch shows what I mean here. I think that there is a tendency for OldestMxact to be held back by a disproportionate amount (relative to OldestXmin) in the presence of long running transactions and concurrent updates from short-ish transactions. The way that we maintain state in shared memory to compute OldestMxact (OldestMemberMXactId/OldestVisibleMXactId) seems to be vulnerable to that kind of thing. I'm thinking of scenarios where MultiXactIdSetOldestVisible() gets called from a long-running xact, at the first point that it examines any multi, just to read something. That effectively makes the long-running xact hold back OldestMxact, even when it doesn't hold back OldestXmin. A read-only transaction that runs in READ COMMITTED could do this -- it'll call OldestVisibleMXactId() and "lock in" the oldest visible Multi that it needs to continue to see as running, without clearing that until much later (until AtEOXact_MultiXact() is called at xact commit/abort). And without doing anything to hold back OldestXmin by the same amount, or for the same duration. That's the kind of scenario that the patch might make a difference in -- because it exploits the fact that OldestXmin can be a lot less vulnerable than OldestMxact is to being held back by long running xacts. Does that seem plausible to you? -- Peter Geoghegan
Attachment
On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Dec 29, 2022 at 12:00 PM Andres Freund <andres@anarazel.de> wrote: > > > I could just move the same tests from heap_prepare_freeze_tuple() to > > > heap_freeze_execute_prepared(), without changing any of the details. > > > > That might work, yes. > > Attached patch shows how that could work. My plan is to push something like this next week, barring objections. Note that I've inverted the xmax "!TransactionIdDidCommit()" test -- it is now "TransactionIdDidAbort()" instead. I believe that this makes the test more likely to catch problems, since we really should expect relevant xmax XIDs to have aborted, specifically -- since the xmax XIDs in question are always < OldestXmin. (There is a need to use a "!TransactionIdDidCommit()" test specifically in nearby code in FreezeMultiXactId(), because that code has to also deal with in-progress XIDs that are multi members, but that's not the case here.) I'm also going to create a CF entry for the other patch posted to this thread -- the enhancement to FreezeMultiXactId() that aims to get the most out of any expensive calls to GetMultiXactIdMembers(). That approach seems quite promising, and relatively simple. I'm not particularly worried about wasting a call to GetMultiXactIdMembers() during VACUUM, though. I'm more concerned about the actual impact of not doing our best to outright remove Multis during VACUUM. The application itself can experience big performance cliffs from SLRU buffer misses, which VACUUM should do its utmost to prevent. That is an occasional source of big problems [1]. I'm particularly concerned about the possibility that having an updater XID will totally change the characteristics of how multis are processed inside FreezeMultiXactId(). That seems like it might be a really sharp edge. I believe that the page-level freezing patch has already ameliorated the problem, since it made us much less reliant on the case where GetMultiXactIdMembers() returns "nmembers <= 0" for a Multi that happens to be HEAP_XMAX_IS_LOCKED_ONLY(). But we can and should go further than that. [1] https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/ -- Peter Geoghegan