Re: storing an explicit nonce - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: storing an explicit nonce
Date
Msg-id CAOuzzgoLLmHhvXDS6n8q46Tyri_GZJJEaNDaB5ezrGyPfQqfNw@mail.gmail.com
Whole thread Raw
In response to Re: storing an explicit nonce  (Bruce Momjian <bruce@momjian.us>)
Responses Re: storing an explicit nonce
Re: storing an explicit nonce
List pgsql-hackers
Greetings,

On Tue, May 25, 2021 at 22:11 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, May 25, 2021 at 09:58:22PM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Tue, May 25, 2021 at 09:42:48PM -0400, Stephen Frost wrote:
> > > The nonce needs to be a new one, if we include the hint bits in the set
> > > of data which is encrypted.
> > >
> > > However, what I believe folks are getting at here is that we could keep
> > > the LSN the same, but increase the nonce when the hint bits change, but
> > > *not* WAL log either the nonce change or the hint bit change (unless
> > > it's being logged for some other reason, in which case log both), thus
> > > reducing the amount of WAL being produced.  What would matter is that
> > > both the hint bit change and the new nonce hit disk at the same time, or
> > > neither do, or we replay back to some state where the nonce and the hint
> > > bits 'match up' so that the page decrypts (and the integrity check
> > > works).
> >
> > How do we prevent torn pages if we are writing the page with a new
> > nonce, and no WAL-logged full page image?
>
> err, we'd still WAL the FPI, same as we do for checksums, that's what I
> would expect and would think we'd need.  As long as the FPI is in the
> WAL since the last checkpoint, later changes to hint bits or the nonce
> wouldn't matter- we'll replay the FPI and that'll have the right nonce
> for the hint bits that were part of the FPI.
>
> Any subsequent changes to the hint bits wouldn't be WAL'd though and
> neither would the changes to the nonce and that all should be fine
> because we'll blow away the entire page on crash recovery to push it
> back to what it was when we first wrote the page after the last
> checkpoint.  Naturally, other changes which have to be WAL'd would still
> be done but those would be replayed in shared buffers on top of the
> prior FPI and the nonce set to some $new value (one which we know
> couldn't have been used prior, by incrementing by some value) when we go
> to write out that new page.

OK, I see what you are saying.  If we use a nonce that is not the full
page write LSN then we can use it for hint bit changes _after_ the first
full page write during the checkpoint, and we don't need to WAL log that
since it isn't a real LSN and we can throw it away on crash recovery.
This is not possible if we are using the LSN for the full page write LSN
for the hint bit nonce, though we could use a dummy WAL record to
generate an LSN for this, right?

Yes, think you’ve got it.  To do it using LSNs and ensure that we always have a unique nonce we’d have to generated dummy WAL, in order to get new LSNs to make sure the nonce is always unique and that wouldn’t be great.

Andres mentioned other possible cases where the LSN doesn’t change even though we change the page and, as he’s probably right, we would have to figure out a solution in those cases too (potentially including cases like crash recovery or replay on a replica where we can’t really just go around creating dummy WAL records to get new LSNs..).  If the nonce isn’t the LSN then suddenly those cases are fine and the LSN can stay the same and it doesn’t matter that the nonce is changed when we write out the page during crash recovery because it’s not tied to the WAL/LSN stream.

If I’ve got it right, that does mean that the nonces on the replica might differ from those on the primary though and I’m not completely sure how I feel about that. We might wish to explicitly document that, due to such risk, users should use unique and distinct keys on each replica that are different from the primary and each other (not a bad idea in general anyway, but would be quite important with this strategy).

Thanks,

Stephen

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Next
From: Bruce Momjian
Date:
Subject: Re: storing an explicit nonce