Re: Rework the way multixact truncations work - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Rework the way multixact truncations work
Date
Msg-id 20151025020700.GA449185@tornado.leadboat.com
Whole thread Raw
In response to Re: Rework the way multixact truncations work  (Andres Freund <andres@anarazel.de>)
Responses Re: Rework the way multixact truncations work  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I'm several days into a review of this change (commits 4f627f8 and aa29c1c).
There's one part of the design I want to understand before commenting on
specific code.  What did you anticipate to be the consequences of failing to
remove SLRU segment files that MultiXactState->oldestMultiXactId implies we
should have removed?  I ask because, on the one hand, I see code making
substantial efforts to ensure that it truncates exactly as planned:
/* * Do truncation, and the WAL logging of the truncation, in a critical * section. That way offsets/members cannot get
outof sync anymore, i.e. * once consistent the newOldestMulti will always exist in members, even * if we crashed in the
wrongmoment. */START_CRIT_SECTION();
 
/* * Prevent checkpoints from being scheduled concurrently. This is critical * because otherwise a truncation record
mightnot be replayed after a * crash/basebackup, even though the state of the data directory would * require it.
*/Assert(!MyPgXact->delayChkpt);MyPgXact->delayChkpt= true;
 
.../* * Update in-memory limits before performing the truncation, while inside * the critical section: Have to do it
beforetruncation, to prevent * concurrent lookups of those values. Has to be inside the critical * section as otherwise
afuture call to this function would error out, * while looking up the oldest member in offsets, if our caller crashes *
beforeupdating the limits. */
 

On the other hand, TruncateMultiXact() callees ignore unlink() return values:

On Tue, Sep 22, 2015 at 07:57:27PM +0200, Andres Freund wrote:
> On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
> > - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> > least log a message?  If that file is still there when we loop back
> > around, it's going to cause a failure, I think.
> 
> The existing unlink() call doesn't, that's the only reason I didn't add
> a message there. I'm fine with adding a (LOG or WARNING?) message.

Unlinking old pg_clog files is strictly an optimization.  If you were to
comment out every unlink() call in slru.c, the only ill effect on CLOG is the
waste of disk space.  Is the same true of MultiXact?

If there's anyplace where failure to unlink would cause a malfunction, I think
it would be around the use of SlruScanDirCbFindEarliest().  That function's
result becomes undefined if the range of pg_multixact/offsets segment files
present on disk spans more than about INT_MAX/2 MultiXactId.

Thanks,
nm



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: [patch] extensions_path GUC
Next
From: Emre Hasegeli
Date:
Subject: Re: Proposal: Trigonometric functions in degrees