Re: Rework the way multixact truncations work - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Rework the way multixact truncations work
Date
Msg-id 20150922175727.GA1573@awork2.anarazel.de
Whole thread Raw
In response to Re: Rework the way multixact truncations work  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Rework the way multixact truncations work  (Robert Haas <robertmhaas@gmail.com>)
Re: Rework the way multixact truncations work  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On 2015-09-22 13:38:58 -0400, Robert Haas wrote:
> Regarding 0003, I'm still very much not convinced that it's a good
> idea to apply this to 9.3 and 9.4.  This patch changes the way we do
> truncation in those older releases; instead of happening at a
> restartpoint, it happens when oldestMultiXid advances.

The primary reason for doing that is that doing it at restartpoints is
simply *wrong*. Restartpoints aren't scheduled in sync with replay -
which means that a restartpoint can (will actually) happen long long
after the checkpoint from the primary has replayed.  Which means that by
the time the restartpoint is performed it's actually not unlikely that
we've already filled all slru segments. Which is bad if we then fail
over/start up.

Aside from the more fundamental issue that restartpoints have to be
"asynchronous" with respect to the checkpoint record for performance
reasons, there's a bunch of additional reasons making this even more
likely to occur: Differing checkpoint segments on the standby and
pending actions (which we got rid off in 9.5+, but ...)

> I realize that you disagree and will probably commit this to those
> branches anyway. But I want it to be clear that I don't endorse that.

I don't plan to commit/backpatch this over your objection.

I do think it'd be the better approach, and I personally think that
we're much more likely to introduce bugs if we backpatch this in a
year. Which I think we'll end up having to. The longer people run on
these branches, the more issues we'll see.

> I wish more people were paying attention to these patches.

+many

> Other issues:
> - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> least log a message?  If that file is still there when we loop back
> around, it's going to cause a failure, I think.

The existing unlink() call doesn't, that's the only reason I didn't add
a message there. I'm fine with adding a (LOG or WARNING?) message.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: [COMMITTERS] pgsql: Use gender-neutral language in documentation
Next
From: Alvaro Herrera
Date:
Subject: Re: Rework the way multixact truncations work