Re: better page-level checksums - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: better page-level checksums
Date
Msg-id CAH2-WzkHDD33osG3CT0wrX9s_7MvRPSQOgkm7Hdj1F9prDUudA@mail.gmail.com
Whole thread Raw
In response to Re: better page-level checksums  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Jun 14, 2022 at 10:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm, but on the other hand, if you imagine a scenario in which the
> "storage system extra blob" is actually a nonce for TDE, you need to
> be able to find it before you've decrypted the rest of the page. If
> pd_checksum gives you the offset of that data, you need to exclude it
> from what gets encrypted, which means that you need encrypt three
> separate non-contiguous areas of the page whose combined size is
> unlikely to be a multiple of the encryption algorithm's block size.
> That kind of sucks (and putting it at the end of the page makes it way
> better).

I don't have a great understanding of how that cost will be felt in
detail right now, because I don't know enough about the project and
the requirements for TDE in general.

> That said, I certainly agree that finding the special space needs to
> be fast. The question in my mind is HOW fast it needs to be, and what
> techniques we might be able to use to dodge the problem. For instance,
> suppose that, during the startup sequence, we look at the control
> file, figure out the size of the 'storage system extra blob', and
> based on that each AM figures out the byte-offset of its special space
> and caches that in a global variable. Then, instead of
> PageGetSpecialSpace(page) it does PageGetBtreeSpecialSpace(page) or
> whatever, where the implementation is ((char*) page) +
> the_afformentioned_global_variable. Is that going to be too slow?

Who knows? For now the important point is that there is a tension
between the requirements of TDE, and the requirements of access
methods (especially index access methods). It's possible that this
will turn out not to be much of a problem. But the burden of proof is
yours. Making a big change to the on-disk format like this (a change
that affects every access method) should be held to an exceptionally
high standard.

There are bound to be tacit or even explicit assumptions made by
access methods that you risk breaking here. The reality is that all of
the access method code evolved in an environment where the special
space size was constant and generic for a given BLCKSZ. I don't have
much sympathy for any suggestion that code written 20 years ago should
have known not to make these assumptions. I have a lot more sympathy
for the idea that it's a general problem with our infrastructure
(particularly code in bufpage.c and the delicate assumptions made by
its callers) -- a problem that is worth addressing with a broad
solution that enables lots of different work.

We don't necessarily get another shot at this if we get it wrong now.

> There's going to have to be some compromise here. On the one hand
> you're going to have people who want to be able to do run-time
> conversions between page formats even at the cost of extra runtime
> overhead on top of what the basic feature necessarily implies. On the
> other hand you're going to have people who don't think any overhead at
> all is acceptable, even if it's purely nominal and only visible on a
> microbenchmark. Such arguments can easily become holy wars.

How many times has a big change to the on-disk format of this kind of
magnitude taken place, post-pg_upgrade? I would argue that this would
be the first, since it is the moral equivalent of extending the size
of the generic page header.

For all I know the overhead will be perfectly fine, and everybody
wins. I just want to be adamant that we're making the right
trade-offs, and maximizing the benefit from any new cost imposed on
access method code.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pltcl crash on recent macOS
Next
From: Peter Eisentraut
Date:
Subject: Re: Collation version tracking for macOS