Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation() - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date
Msg-id CA+TgmoaefuTNRFxncnKLuPaT1NSb64ZMymiW1Lhy7_3=UZ3WdA@mail.gmail.com
Whole thread Raw
In response to Re: Changing WAL Header to reduce contention duringReserveXLogInsertLocation()  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Changing WAL Header to reduce contention duringReserveXLogInsertLocation()
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
List pgsql-hackers
On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 03/29/2018 06:42 PM, Tom Lane wrote:
>> The long and the short of it is that this is a very dangerous-looking
>> proposal, we are at the tail end of a development cycle, and there are
>> ~100 other patches remaining in the commitfest that also have claims
>> on our attention in the short time that's left.  If you're expecting
>> people to spend more time thinking about this now, I feel you're being
>> unreasonable.
>
> I agree.

+1.

>> Also, I will say it once more: this change DOES decrease robustness.
>> It's like blockchain without the chain aspect, or git commits without
>> a parent pointer.  We are not only interested in whether individual
>> WAL records are valid, but whether they form a consistent series.
>> Cross-checking xl_prev provides some measure of confidence about that;
>> xl_curr offers none.
>
> Not sure.
>
> If each WAL record has xl_curr, then we know to which position the
> record belongs (after verifying the checksum). And we do know the size
> of each WAL record, so we should be able to deduce if two records are
> immediately after each other. Which I think is enough to rebuild the
> chain of WAL records.
>
> To defeat this, this would need to happen:
>
> a) the WAL record gets written to a different location
> b) the xl_curr gets corrupted in sync with (a)
> c) the WAL checksum gets corrupted in sync with (b)
> d) the record overwrites existing record (same size/boundaries)
>
> That seems very much like xl_prev.

I don't think so, because this ignores, for example, timeline
switches, or multiple clusters accidentally sharing an archive
directory.

TBH, I'm not entirely sure how likely corruption would be under this
proposal in practice, but I think Tom's got the right idea on a
theoretical level.   As he says, there's a reason why things like
block chains and git commits include something about the previous
record in what gets hashed to create the identifier for the new
record.  It ties those things together in a way that doesn't happen
otherwise.  If somebody proposed changing git so that the SHA
identifier for a commit didn't hash the commit ID for the previous
commit, that would break the security of the system in all kinds of
ways: for example, an adversary could maliciously edit an old commit
by just changing its SHA identifier and that of its immediate
descendent.  No other commit IDs would change and integrity checks
would not fail -- there would be no easy way to notice that something
bad had happened.

Now, in our case, the chances of problems are clearly a lot more
remote because of the way that WAL records are packed into files.
Every WAL record has a place where it's supposed to appear in the WAL
stream, and positions in the WAL stream are never intentionally
recycled (except in PITR scenarios, I guess).  Because of that, the
proposed xl_curr check provides a pretty strong cross-check on the
validity of a record even without xl_prev.  I think there is an
argument to be made - as Simon is doing - that this check is in fact
strong enough that it's good enough for government work.  He might be
right.  But he might also be wrong, because I think Tom is
unquestionably correct when he says that this is weakening the check.
What I'm not sure about is whether it's being weakened enough that
it's actually going to cause problems in practice.  Given where we are
in the release cycle, I'd prefer to defer that question to next cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pgsql: Add documentation for the JIT feature.
Next
From: Andres Freund
Date:
Subject: Re: Proposal: http2 wire format