Re: ERROR: multixact X from before cutoff Y found to be still running - Mailing list pgsql-bugs

From Robert Haas
Subject Re: ERROR: multixact X from before cutoff Y found to be still running
Date
Msg-id CA+TgmoYdT9uxd1wF1Vvz1wHFfirXju9pC9GspzpRUa=1shqC7w@mail.gmail.com
Whole thread Raw
In response to Re: ERROR: multixact X from before cutoff Y found to be still running  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-bugs
On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Right, the v2 patch will effectively ramp-down the freezemin as your
> freeze_max_age gets smaller, while the v1 patch will set the effective
> freezemin to zero as soon as your multixact age passes the threshold.
> I think what is unclear to me is whether this ramp-down behavior is
> the intended functionality or we should be doing something similar to
> what we do for regular transaction IDs (i.e. force freezemin to zero
> right after it hits the "oldest xmin is far in the past" threshold).
> The comment above MultiXactMemberFreezeThreshold() explains things
> pretty well, but AFAICT it is more geared towards influencing
> autovacuum scheduling.  I agree that v2 is safer from the standpoint
> that it changes as little as possible, though.

I don't presently have a view on fixing the actual but here, but I can
certainly confirm that I intended MultiXactMemberFreezeThreshold() to
ratchet up the pressure gradually rather than all at once, and my
suspicion is that this behavior may be good to retain, but I'm not
sure.

One difference between regular XIDs and MultiXacts is that there's
only one reason why we can need to vacuum XIDs, but there are two
reasons why we can need to vacuum MultiXacts.  We can either be
running short of members space or we can be running short of offset
space, and running out of either one is bad. Regular XIDs have no
analogue of this problem: there's only one thing that you can exhaust.
At the time I wrote MultiXactMemberFreezeThreshold(), only the
'offsets' array had any sort of wraparound protection, and it was
space in 'offsets' that was measured by relminmxid, datminmxid, etc.
You could imagine having separate catalog state to track space in the
'members' SLRU, e.g. relminmxidmembers, datminmxidmembers, etc., but
that wasn't really an option for fixing the bug at hand, because it
wouldn't have been back-patchable.

So the challenge was to find some way of using the existing catalog
state to try to provide wraparound protection for a new kind of thing
for which wraparound protection had not been previously contemplated.
And so MultiXactMemberFreezeThreshold() was born.

(I apologize if any of the above sounds like I'm talking credit for
work actually done by Thomas, who I see is listed as the primary
author of the commit in question.  I feel like I invented
MultiXactMemberFreezeThreshold and the big comment at the top of it
looks to me like something I wrote, but but this was a long time ago
and I don't really remember who did what. My intent here is to provide
some context that may be useful based on what I remember about that
patch, not to steal anybody's thunder.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-bugs by date:

Previous
From: Ram Pratap Maurya
Date:
Subject: RE: BUG #15992: Index size larger than the base table size. Sometime3 times large
Next
From: Dave Cramer
Date:
Subject: Re: BUG #15808: ERROR: subtransaction logged without previoustop-level txn record (SQLSTATE XX000)