Re: Rework the way multixact truncations work - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Rework the way multixact truncations work
Date
Msg-id 20150922232431.GC1573@awork2.anarazel.de
Whole thread Raw
In response to Re: Rework the way multixact truncations work  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Rework the way multixact truncations work  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2015-09-22 14:52:49 -0400, Robert Haas wrote:
> 1. It would be possible to write a patch that included ONLY the
> changes needed to make that happen, and did nothing else.  It would be
> largely a subset of this.  If we want to change 9.3 and 9.4, I
> recommend we do that first, and then come back to the rest of this.

I think that patch would be pretty much what I wrote.

To be correct you basically have to:
1) Never skip a truncation on the standby. Otherwise there might have  already have been wraparound and you read the
completelywrong  offset.
 
2) Always perform truncations on the standby exactly the same moment (in  the replay stream) as on the primary.
Otherwisethere also can be a  wraparound.
 
3) Never read anything from an SLRU from the data directory while  inconsistent. In an inconsistent state we can read
completelywrong  data. A standby can be inconsistent in many situations, including  crashes, restarts and fresh base
backups.

To me these three together leave only the option to never read an SLRUs
contents on a standby.  That only leaves minor changes in the patch that
could be removed afaics. I mean we could leave in
DetermineSafeOldestOffset() but it'd be doing pretty much the same as
SetOffsetVacuumLimit().


I think we put at least three layers on bandaid on this issue since
9.3.2, and each layer made things more complicated. We primarily did so
because of the compatibility and complexity concerns. I think that was a
bad mistake. We should have done it mostly right back then, and we'd be
better of now. If we continue with bandaids on the back branches while
having a fixed 9.5+ with significantly different behaviour we'll have a
hellish time fixing things in the back branches. And introduce more bugs
than this might introduce.


> 2. I agree that what we're doing right now is wrong.  And I agree that
> this fixes a real problem. But it seems to me to be quite possible,
> even likely, that it will create other problems.

Possible. But I think those bugs will be just bugs and not more
fundamental architectural problems.

To be very clear. I'm scared of the idea of backpatching this. I'm more
scared of doing that myself. But even more I am scared of the current
state.


> For example, suppose that there are files in the data directory that
> precede oldestMultiXact. In the current approach, we'll remove those
> because they're not in the range we expect to be used.

Hm. For offsets/ we continue to use SimpleLruTruncate() for truncation,
which scans the directory, so I don't see a problem. For members/ we
won't - but neither do we really today, see
SlruScanDirCbRemoveMembers(). So I don't think there'll be a significant
difference?


> a leftover old file that doesn't get removed the first time through -
> for whatever reason - becomes a time bomb that will explode on the
> next wraparound.  I don't know that that will happen.

We should be able to deal with that, otherwise recovery is pretty
borked. It can be a problem for the 'recovery from wrong oldest multi'
case, but that's the same today.


> I will bet you a beer that there are other possible hazards neither of
> us is foreseeing right now.

Right. I'm not dismissing that. I just think it's much more likely to be
handleable problems than the set we have today. It's incredibly hard to
get an accurate mental model of the combined behaviour & state of
primary and standby today. Even if we three have that today, I'm pretty
sure we won't in half a year. And sure as hell nearly nobody else will
have one.


> >> - If SlruDeleteSegment fails in unlink(), shouldn't we at the very
> >> least log a message?  If that file is still there when we loop back
> >> around, it's going to cause a failure, I think.
> >
> > The existing unlink() call doesn't, that's the only reason I didn't add
> > a message there. I'm fine with adding a (LOG or WARNING?) message.
> 
> Cool.

Hm. When redoing a truncation during [crash] recovery that can cause a
host of spurious warnings if already done before. DEBUG1 to avoid
scaring users?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Less than ideal error reporting in pg_stat_statements
Next
From: Jim Nasby
Date:
Subject: Re: Less than ideal error reporting in pg_stat_statements