Re: [PATCHES] Post-special page storage TDE support - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: [PATCHES] Post-special page storage TDE support
Date
Msg-id CAEze2WiwVws0PcdZhwm-mYb85kF2MZ-APT+bRekpy_D6wVCnvw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCHES] Post-special page storage TDE support  (David Christensen <david.christensen@crunchydata.com>)
Responses Re: [PATCHES] Post-special page storage TDE support  (David Christensen <david.christensen@crunchydata.com>)
List pgsql-hackers
On Sat, 29 Oct 2022 at 00:25, David Christensen
<david.christensen@crunchydata.com> wrote:
>
> Hi Matthias,
>
> > Did you read the related thread with related discussion from last June, "Re: better page-level checksums" [0]? In
thatI argued that space at the end of a page is already allocated for the AM, and that reserving variable space at the
endof the page for non-AM usage is wasting the AM's performance potential. 
>
> Yes, I had read parts of that thread among others, but have given it a
> re-read.  I can see the point you're making here, and agree that if we
> can allocate between pd_special and pd_upper that could make sense.  I
> am a little unclear as to what performance impacts for the AM there
> would be if this additional space were ahead or behind the page
> special area; it seems like if this is something that needs to live on
> the page *somewhere* just being aligned correctly would be sufficient
> from the AM's standpoint.

It would be sufficient, but it is definitely suboptimal. See [0] as a
patch that is being held back by putting stuff behind the special
area.

I don't really care much about the storage layout on-disk, but I do
care that AMs have efficient access to their data. For the page
header, line pointers, and special area, that is currently guaranteed
by  the current page layout. However, for the special area, that
currently guaranteed offset of (BLCKSZ -
MAXALIGN(sizeof(IndexOpaque))) will get broken as there would be more
space in the special area than the AM would be expecting. Right now,
our index AMs are doing pointer chasing during special area lookups
for no good reason, but with the patch it would be required. I don't
like that at all.

Because I understand that it is a requirement to store this reserved
space in a fixed place on the on-disk page (you must know where the
checksum is at static places on the page, otherwise you'd potentially
mis-validate a page) but that requirement is not there for in-memory
storage. I think it's a small price to pay to swap the fields around
during R/W operations - the largest size of special area is currently
24 bytes, and the proposals I've seen for this extra storage area
would not need it to be actually filled with data whilst the page is
being used by the AM (checksum could be zeroed in in-memory
operations, and it'd get set during writeback; same with all other
fields I can imagine the storage system using).

> Considering that I am trying to make this
> have zero storage impact if these features are not active, the impact
> on a cluster with no additional features would be moot from a storage
> perspective, no?

The issue is that I'd like to eliminate the redirection from the page
header in the hot path. Currently, we can do that, and pd_special
would be little more than a hint to the smgr and pd_linp code that
that area is special and reserved for this access method's private
data, so that it is not freed. If you stick something extra in there,
it's not special for the AM's private data, and the AM won't be able
to use pd_special for similar uses as pd_linp+pd_lower. I'd rather
have the storage system use it's own not-special area; choreographed
by e.g. a reuse of pd_checksum for one more page offset. Swapping the
fields around between on-disk and in-memory doesn't need to be an
issue, as special areas are rarely very large.

Every index type we support utilizes the special area. Wouldn't those
in-memory operations have priority on this useful space, as opposed to
a storage system that maybe will be used in new clusters, and even
then only during R/W operations to disk (each at most once for N
memory operations)?

> > Apart from that: Is this variable-sized 'metadata' associated with smgr infrastructure only, or is it also
availablefor AM features? If not; then this is a strong -1. The amount of tasks smgr needs to do on a page is generally
muchless than the amount of tasks an AM needs to do; so in my view the AM has priority in prime page real estate, not
smgror related infrastructure. 
>
> I will confess to a slightly wobbly understanding of the delineation
> of responsibility here. I was under the impression that by modifying
> any consumer of PageHeaderData this would be sufficient to cover all
> AMs for the types of cluster-wide options we'd be concerned about (say
> extended checksums, multiple page encryption schemes, or other
> per-page information we haven't yet anticipated).  Reading smgr/README
> and the various access/*/README has not made the distinction clear to
> me yet.

pd_special has (historically) been reserved for access methods'
page-level private data. If you add to this area, shouldn't that be
space that the AM should be able to hook into as well? Or are all
those features limited to the storage system only; i.e. the storage
system decides what's best for the AM's page handling w.r.t. physical
storage?

> > re: PageFeatures
> > I'm not sure I understand the goal, nor the reasoning. Shouldn't this be part of the storage manager (smgr)
implementation/ can't this be part of the smgr of the relation? 
>
> For at least the feature cases I'm anticipating, this would apply to
> any disk page that may have user data, set (at least initially) at
> initdb time, so should apply to any pages in the cluster, regardless
> of AM.

OK, so having a storage manager for each supported set of features is
not planned for this. Understood.

> > re: use of pd_checksum
> > I mentioned this in the above-mentioned thread too, in [1], that we could use pd_checksum as an extra area marker
forthis storage-specific data, which would be located between pd_upper and pd_special. 
>
> I do think that we could indeed use this as an additional in-page
> pointer, but at least for this version was keeping things
> backwards-compatible.  Peter G (I think) also made some good points
> about how to include the various status bits on the page somehow in
> terms of making a page completely self-contained.

I think that adding page header bits would suffice for backwards
compatibility if we'd want to reuse pd_checksum. A new
PD_CHECKSUM_REUSED_FOR_STORAGE would suffice here; it would be unset
in normal (pre-patch, or without these fancy new features) clusters.

Kind regards,

Matthias van de Meent

PS. sorry for the rant. I hope my arguments are clear why I dislike
the storage area being placed behind the special area in memory.

[0] https://commitfest.postgresql.org/40/3543/



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: WIP: Aggregation push-down - take2
Next
From: "Jonathan S. Katz"
Date:
Subject: User functions for building SCRAM secrets