Thread: Avoiding unnecessary clog lookups while freezing

Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
I took another look at the code coverage situation around freezing
following pushing the page-level freezing patch earlier today. I
spotted an issue that I'd missed up until now: certain sanity checks
in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more
often than really seems necessary.

Theoretically this is an old issue that dates back to commit
699bf7d05c, as opposed to an issue in the page-level freezing patch.
But that's not really true in a real practical sense. In practice the
calls to TransactionIdDidCommit() will happen far more frequently
following today's commit 1de58df4fe (since we're using OldestXmin as
the cutoff that gates performing freeze_xmin/freeze_xmax processing --
not FreezeLimit).

No regressions related to clog lookups by VACUUM were apparent from my
performance validation of the page-level freezing work, but I suspect
that the increase in TransactionIdDidCommit() calls will cause
noticeable regressions with the right/wrong workload and/or
configuration. The page-level freezing work is expected to reduce clog
lookups in VACUUM in general, which is bound to have been a
confounding factor.

I see no reason why we can't just condition the XID sanity check calls
to TransactionIdDidCommit() (for both the freeze_xmin and the
freeze_xmax callsites) on a cheap tuple hint bit precheck not being
enough. We only need an expensive call to TransactionIdDidCommit()
when the precheck doesn't immediately indicate that the tuple xmin
looks committed when that's what the sanity check expects to see (or
that the tuple's xmax looks aborted, in the case of the callsite where
that's what we expect to see).

Attached patch shows what I mean. A quick run of the standard
regression tests (with custom instrumentation) shows that this patch
eliminates 100% of all relevant calls to TransactionIdDidCommit(), for
both the freeze_xmin and the freeze_xmax callsites.

-- 
Peter Geoghegan

Attachment

Re: Avoiding unnecessary clog lookups while freezing

From
Andres Freund
Date:
Hi,

On 2022-12-28 15:24:28 -0800, Peter Geoghegan wrote:
> I took another look at the code coverage situation around freezing
> following pushing the page-level freezing patch earlier today. I
> spotted an issue that I'd missed up until now: certain sanity checks
> in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more
> often than really seems necessary.
> 
> Theoretically this is an old issue that dates back to commit
> 699bf7d05c, as opposed to an issue in the page-level freezing patch.
> But that's not really true in a real practical sense. In practice the
> calls to TransactionIdDidCommit() will happen far more frequently
> following today's commit 1de58df4fe (since we're using OldestXmin as
> the cutoff that gates performing freeze_xmin/freeze_xmax processing --
> not FreezeLimit).

Hm. But we still only do the check when we actually freeze, rather than just
during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
increased overhead when we also do more WAL logging etc. Correct?


> I took another look at the code coverage situation around freezing
> I see no reason why we can't just condition the XID sanity check calls
> to TransactionIdDidCommit() (for both the freeze_xmin and the
> freeze_xmax callsites) on a cheap tuple hint bit precheck not being
> enough. We only need an expensive call to TransactionIdDidCommit()
> when the precheck doesn't immediately indicate that the tuple xmin
> looks committed when that's what the sanity check expects to see (or
> that the tuple's xmax looks aborted, in the case of the callsite where
> that's what we expect to see).

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.


> Attached patch shows what I mean. A quick run of the standard
> regression tests (with custom instrumentation) shows that this patch
> eliminates 100% of all relevant calls to TransactionIdDidCommit(), for
> both the freeze_xmin and the freeze_xmax callsites.

There's practically no tuple-level concurrent activity in a normal regression
test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more
interesting to run tests, including isolationtester and pgbench, against a
running cluster.

Greetings,

Andres Freund



Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Wed, Dec 28, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote:
> > Theoretically this is an old issue that dates back to commit
> > 699bf7d05c, as opposed to an issue in the page-level freezing patch.
> > But that's not really true in a real practical sense. In practice the
> > calls to TransactionIdDidCommit() will happen far more frequently
> > following today's commit 1de58df4fe (since we're using OldestXmin as
> > the cutoff that gates performing freeze_xmin/freeze_xmax processing --
> > not FreezeLimit).
>
> Hm. But we still only do the check when we actually freeze, rather than just
> during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
> increased overhead when we also do more WAL logging etc. Correct?

Yes, that's how it worked up until today's commit 1de58df4fe.

I don't have strong feelings about back patching a fix, but this does
seem like something that I should fix now, on HEAD.

> Hm. I dimply recall that we had repeated cases where the hint bits were set
> wrongly due to some of the multixact related bugs. I think I was trying to be
> paranoid about not freezing stuff in those situations, since it can lead to
> reviving dead tuples, which obviously is bad.

I think that it's a reasonable check, and I'm totally in favor of
keeping it (or something very close, at least).

> There's practically no tuple-level concurrent activity in a normal regression
> test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more
> interesting to run tests, including isolationtester and pgbench, against a
> running cluster.

Even on HEAD, with page-level freezing in place, we're only going to
test XIDs that are < OldestXmin, that appear on pages tha VACUUM
actually scans in the first place. Just checking tuple-level hint bits
should be effective. But even if it isn't (for whatever reason) then
it's similar to cases where our second heap pass has to do clog
lookups in heap_page_is_all_visible() just because hint bits couldn't
be set earlier on, back when lazy_scan_prune() processed the same page
during VACUUM's initial heap pass.

-- 
Peter Geoghegan



Re: Avoiding unnecessary clog lookups while freezing

From
Andres Freund
Date:
On 2022-12-28 16:37:27 -0800, Peter Geoghegan wrote:
> On Wed, Dec 28, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote:
> > > Theoretically this is an old issue that dates back to commit
> > > 699bf7d05c, as opposed to an issue in the page-level freezing patch.
> > > But that's not really true in a real practical sense. In practice the
> > > calls to TransactionIdDidCommit() will happen far more frequently
> > > following today's commit 1de58df4fe (since we're using OldestXmin as
> > > the cutoff that gates performing freeze_xmin/freeze_xmax processing --
> > > not FreezeLimit).
> >
> > Hm. But we still only do the check when we actually freeze, rather than just
> > during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
> > increased overhead when we also do more WAL logging etc. Correct?
> 
> Yes, that's how it worked up until today's commit 1de58df4fe.
> 
> I don't have strong feelings about back patching a fix, but this does
> seem like something that I should fix now, on HEAD.
>
> > Hm. I dimply recall that we had repeated cases where the hint bits were set
> > wrongly due to some of the multixact related bugs. I think I was trying to be
> > paranoid about not freezing stuff in those situations, since it can lead to
> > reviving dead tuples, which obviously is bad.
> 
> I think that it's a reasonable check, and I'm totally in favor of
> keeping it (or something very close, at least).

I don't quite follow - one paragraph up you say we should fix something, and
then here you seem to say we should continue not to rely on the hint bits?



Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Wed, Dec 28, 2022 at 4:43 PM Andres Freund <andres@anarazel.de> wrote:
> > > Hm. I dimply recall that we had repeated cases where the hint bits were set
> > > wrongly due to some of the multixact related bugs. I think I was trying to be
> > > paranoid about not freezing stuff in those situations, since it can lead to
> > > reviving dead tuples, which obviously is bad.
> >
> > I think that it's a reasonable check, and I'm totally in favor of
> > keeping it (or something very close, at least).
>
> I don't quite follow - one paragraph up you say we should fix something, and
> then here you seem to say we should continue not to rely on the hint bits?

I didn't mean that we should continue to not rely on the hint bits. Is
that really all that the test is for? I think of it as a general sanity check.

The important call to avoid with page-level freezing is the xmin call to
TransactionIdDidCommit(), not the xmax call. The xmax call only occurs
when VACUUM prepares to freeze a tuple that was updated by an updater
(not a locker) that aborted. While the xmin calls will now take place with most
unfrozen tuples.

--
Peter Geoghegan



Re: Avoiding unnecessary clog lookups while freezing

From
Andres Freund
Date:
Hi,

On 2022-12-28 22:36:53 -0800, Peter Geoghegan wrote:
> On Wed, Dec 28, 2022 at 4:43 PM Andres Freund <andres@anarazel.de> wrote:
> > > > Hm. I dimply recall that we had repeated cases where the hint bits were set
> > > > wrongly due to some of the multixact related bugs. I think I was trying to be
> > > > paranoid about not freezing stuff in those situations, since it can lead to
> > > > reviving dead tuples, which obviously is bad.
> > >
> > > I think that it's a reasonable check, and I'm totally in favor of
> > > keeping it (or something very close, at least).
> >
> > I don't quite follow - one paragraph up you say we should fix something, and
> > then here you seem to say we should continue not to rely on the hint bits?
> 
> I didn't mean that we should continue to not rely on the hint bits. Is
> that really all that the test is for? I think of it as a general sanity check.

I do think we wanted to avoid reviving actually-dead tuples (present due to
the multixact and related bugs). And I'm worried about giving that checking
up, I've seen it hit too many times. Both in the real world and during
development.

Somewhat of a tangent: I've previously wondered if we should have a small
hash-table based clog cache. The current one-element cache doesn't suffice in
a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
clog accesses.

Greetings,

Andres Freund



Re: Avoiding unnecessary clog lookups while freezing

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Somewhat of a tangent: I've previously wondered if we should have a small
> hash-table based clog cache. The current one-element cache doesn't suffice in
> a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
> clog accesses.

I've wondered about that too.  The one-element cache was a good hack
in its day, but it looks a bit under-engineered by our current
standards.  (Also, maybe it'd be plausible to have a one-element
cache in front of a small hash?)

            regards, tom lane



Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Thu, Dec 29, 2022 at 9:21 AM Andres Freund <andres@anarazel.de> wrote:
> I do think we wanted to avoid reviving actually-dead tuples (present due to
> the multixact and related bugs). And I'm worried about giving that checking
> up, I've seen it hit too many times. Both in the real world and during
> development.

I could just move the same tests from heap_prepare_freeze_tuple() to
heap_freeze_execute_prepared(), without changing any of the details.
That would mean that the TransactionIdDidCommit() calls would only
take place with tuples that actually get frozen, which is more or less
how it worked before now.

heap_prepare_freeze_tuple() will now often prepare freeze plans that
just get discarded by lazy_scan_prune(). My concern is the impact on
tables/pages that almost always discard prepared freeze plans, and so
require many TransactionIdDidCommit() calls that really aren't
necessary.

> Somewhat of a tangent: I've previously wondered if we should have a small
> hash-table based clog cache. The current one-element cache doesn't suffice in
> a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
> clog accesses.

I imagine that the one-element cache works alright in some scenarios,
but then suddenly doesn't work so well, even though not very much has
changed. Behavior like that makes the problems difficult to analyze,
and easy to miss. I'm suspicious of that.

-- 
Peter Geoghegan



Re: Avoiding unnecessary clog lookups while freezing

From
Andres Freund
Date:
Hi,

On 2022-12-29 09:43:39 -0800, Peter Geoghegan wrote:
> On Thu, Dec 29, 2022 at 9:21 AM Andres Freund <andres@anarazel.de> wrote:
> > I do think we wanted to avoid reviving actually-dead tuples (present due to
> > the multixact and related bugs). And I'm worried about giving that checking
> > up, I've seen it hit too many times. Both in the real world and during
> > development.
>
> I could just move the same tests from heap_prepare_freeze_tuple() to
> heap_freeze_execute_prepared(), without changing any of the details.

That might work, yes.


> That would mean that the TransactionIdDidCommit() calls would only
> take place with tuples that actually get frozen, which is more or less
> how it worked before now.
>
> heap_prepare_freeze_tuple() will now often prepare freeze plans that
> just get discarded by lazy_scan_prune(). My concern is the impact on
> tables/pages that almost always discard prepared freeze plans, and so
> require many TransactionIdDidCommit() calls that really aren't
> necessary.

It seems somewhat wrong that we discard all the work that
heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
a bunch of important cases (e.g. creating a new multixact), but even so,
e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
just creating the freeze plan free.

I think the better approach might be to make heap_tuple_should_freeze() more
powerful and to only create the freeze plan when actually freezing.


I wonder how often it'd be worthwhile to also do opportunistic freezing during
lazy_vacuum_heap_page(), given that we already will WAL log (and often issue
an FPI).


> > Somewhat of a tangent: I've previously wondered if we should have a small
> > hash-table based clog cache. The current one-element cache doesn't suffice in
> > a lot of scenarios, but it wouldn't take a huge cache to end up filtering most
> > clog accesses.
>
> I imagine that the one-element cache works alright in some scenarios,
> but then suddenly doesn't work so well, even though not very much has
> changed. Behavior like that makes the problems difficult to analyze,
> and easy to miss. I'm suspicious of that.

I think there's a lot of situations where it flat out doesn't work - even if
you just have an inserting and a deleting transaction, we'll often end up not
hitting the 1-element cache due to looking up two different xids in a roughly
alternating pattern...

Greetings,

Andres Freund



Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Thu, Dec 29, 2022 at 12:00 PM Andres Freund <andres@anarazel.de> wrote:
> > I could just move the same tests from heap_prepare_freeze_tuple() to
> > heap_freeze_execute_prepared(), without changing any of the details.
>
> That might work, yes.

Attached patch shows how that could work.

> It seems somewhat wrong that we discard all the work that
> heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
> a bunch of important cases (e.g. creating a new multixact), but even so,
> e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
> just creating the freeze plan free.

I'm not sure what you mean by that. I believe that the page-level
freezing changes do not allow FreezeMultiXactId() to call
GetMultiXactIdMembers() any more often than before. Are you concerned
about a regression, or something more general than that?

The only case that we *don't* force xmax freezing in
FreezeMultiXactId() is the FRM_NOOP case. Note in particular that we
will reliably force freezing for any Multi < OldestMxact (not <
MultiXactCutoff).

> I wonder how often it'd be worthwhile to also do opportunistic freezing during
> lazy_vacuum_heap_page(), given that we already will WAL log (and often issue
> an FPI).

Yeah, we don't actually need a cleanup lock for that. It might also
make sense to teach lazy_scan_prune() to anticipate what will happen
later on, in lazy_vacuum_heap_page(), so that it can freeze based on
the same observation about the cost. (It already does a certain amount
of this kind of thing, in fact.)

-- 
Peter Geoghegan

Attachment

Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > It seems somewhat wrong that we discard all the work that
> > heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
> > a bunch of important cases (e.g. creating a new multixact), but even so,
> > e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
> > just creating the freeze plan free.
>
> I'm not sure what you mean by that. I believe that the page-level
> freezing changes do not allow FreezeMultiXactId() to call
> GetMultiXactIdMembers() any more often than before. Are you concerned
> about a regression, or something more general than that?

Here's an idea that seems like it could ameliorate the issue:

When we're looping through members from GetMultiXactIdMembers(), and
seeing if we can get away with !need_replace/FRM_NOOP processing, why
not also check if there are any XIDs >= OldestXmin among the members?
If not (if they're all < OldestXmin), then we should prefer to go
further with processing the Multi now -- FRM_NOOP processing isn't
actually cheaper.

We'll already know that a second pass over the multi really isn't
expensive. And that it will actually result in FRM_INVALIDATE_XMAX
processing, which is ideal. Avoiding a second pass is really just
about avoiding FRM_RETURN_IS_MULTI (and avoiding FRM_RETURN_IS_XID,
perhaps, though to a much lesser degree).

-- 
Peter Geoghegan



Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Thu, Dec 29, 2022 at 12:50 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > It seems somewhat wrong that we discard all the work that
> > > heap_prepare_freeze_tuple() did. Yes, we force freezing to actually happen in
> > > a bunch of important cases (e.g. creating a new multixact), but even so,
> > > e.g. GetMultiXactIdMembers() is awfully expensive to do for nought. Nor is
> > > just creating the freeze plan free.

> Here's an idea that seems like it could ameliorate the issue:
>
> When we're looping through members from GetMultiXactIdMembers(), and
> seeing if we can get away with !need_replace/FRM_NOOP processing, why
> not also check if there are any XIDs >= OldestXmin among the members?
> If not (if they're all < OldestXmin), then we should prefer to go
> further with processing the Multi now -- FRM_NOOP processing isn't
> actually cheaper.

Attached patch shows what I mean here.

I think that there is a tendency for OldestMxact to be held back by a
disproportionate amount (relative to OldestXmin) in the presence of
long running transactions and concurrent updates from short-ish
transactions. The way that we maintain state in shared memory to
compute OldestMxact (OldestMemberMXactId/OldestVisibleMXactId) seems
to be vulnerable to that kind of thing. I'm thinking of scenarios
where MultiXactIdSetOldestVisible() gets called from a long-running
xact, at the first point that it examines any multi, just to read
something. That effectively makes the long-running xact hold back
OldestMxact, even when it doesn't hold back OldestXmin. A read-only
transaction that runs in READ COMMITTED could do this -- it'll call
OldestVisibleMXactId() and "lock in" the oldest visible Multi that it
needs to continue to see as running, without clearing that until much
later (until AtEOXact_MultiXact() is called at xact commit/abort). And
without doing anything to hold back OldestXmin by the same amount, or
for the same duration.

That's the kind of scenario that the patch might make a difference in
-- because it exploits the fact that OldestXmin can be a lot less
vulnerable than OldestMxact is to being held back by long running
xacts. Does that seem plausible to you?

-- 
Peter Geoghegan

Attachment

Re: Avoiding unnecessary clog lookups while freezing

From
Peter Geoghegan
Date:
On Thu, Dec 29, 2022 at 12:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Dec 29, 2022 at 12:00 PM Andres Freund <andres@anarazel.de> wrote:
> > > I could just move the same tests from heap_prepare_freeze_tuple() to
> > > heap_freeze_execute_prepared(), without changing any of the details.
> >
> > That might work, yes.
>
> Attached patch shows how that could work.

My plan is to push something like this next week, barring objections.
Note that I've inverted the xmax "!TransactionIdDidCommit()" test --
it is now "TransactionIdDidAbort()" instead. I believe that this makes
the test more likely to catch problems, since we really should expect
relevant xmax XIDs to have aborted, specifically -- since the xmax
XIDs in question are always < OldestXmin. (There is a need to use a
"!TransactionIdDidCommit()" test specifically in nearby code in
FreezeMultiXactId(), because that code has to also deal with
in-progress XIDs that are multi members, but that's not the case
here.)

I'm also going to create a CF entry for the other patch posted to this
thread -- the enhancement to FreezeMultiXactId() that aims to get the
most out of any expensive calls to GetMultiXactIdMembers(). That
approach seems quite promising, and relatively simple. I'm not
particularly worried about wasting a call to GetMultiXactIdMembers()
during VACUUM, though. I'm more concerned about the actual impact of
not doing our best to outright remove Multis during VACUUM. The
application itself can experience big performance cliffs from SLRU
buffer misses, which VACUUM should do its utmost to prevent. That is
an occasional source of big problems [1].

I'm particularly concerned about the possibility that having an
updater XID will totally change the characteristics of how multis are
processed inside FreezeMultiXactId(). That seems like it might be a
really sharp edge. I believe that the page-level freezing patch has
already ameliorated the problem, since it made us much less reliant on
the case where GetMultiXactIdMembers() returns "nmembers <= 0" for a
Multi that happens to be HEAP_XMAX_IS_LOCKED_ONLY(). But we can and
should go further than that.

[1] https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/
--
Peter Geoghegan