Re: [PATCHES] Post-special page storage TDE support - Mailing list pgsql-hackers

From David Christensen
Subject Re: [PATCHES] Post-special page storage TDE support
Date
Msg-id CAOxo6XKgNkMPfWZPJYeFgd-qTFxBBAv1QbvySXDLjY7Sr-6KLw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCHES] Post-special page storage TDE support  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: [PATCHES] Post-special page storage TDE support
List pgsql-hackers
Hi Matthias,

> Did you read the related thread with related discussion from last June, "Re: better page-level checksums" [0]? In
thatI argued that space at the end of a page is already allocated for the AM, and that reserving variable space at the
endof the page for non-AM usage is wasting the AM's performance potential. 

Yes, I had read parts of that thread among others, but have given it a
re-read.  I can see the point you're making here, and agree that if we
can allocate between pd_special and pd_upper that could make sense.  I
am a little unclear as to what performance impacts for the AM there
would be if this additional space were ahead or behind the page
special area; it seems like if this is something that needs to live on
the page *somewhere* just being aligned correctly would be sufficient
from the AM's standpoint.  Considering that I am trying to make this
have zero storage impact if these features are not active, the impact
on a cluster with no additional features would be moot from a storage
perspective, no?

> Apart from that: Is this variable-sized 'metadata' associated with smgr infrastructure only, or is it also available
forAM features? If not; then this is a strong -1. The amount of tasks smgr needs to do on a page is generally much less
thanthe amount of tasks an AM needs to do; so in my view the AM has priority in prime page real estate, not smgr or
relatedinfrastructure. 

I will confess to a slightly wobbly understanding of the delineation
of responsibility here. I was under the impression that by modifying
any consumer of PageHeaderData this would be sufficient to cover all
AMs for the types of cluster-wide options we'd be concerned about (say
extended checksums, multiple page encryption schemes, or other
per-page information we haven't yet anticipated).  Reading smgr/README
and the various access/*/README has not made the distinction clear to
me yet.

> re: PageFeatures
> I'm not sure I understand the goal, nor the reasoning. Shouldn't this be part of the storage manager (smgr)
implementation/ can't this be part of the smgr of the relation? 

For at least the feature cases I'm anticipating, this would apply to
any disk page that may have user data, set (at least initially) at
initdb time, so should apply to any pages in the cluster, regardless
of AM.

> re: use of pd_checksum
> I mentioned this in the above-mentioned thread too, in [1], that we could use pd_checksum as an extra area marker for
thisstorage-specific data, which would be located between pd_upper and pd_special. 

I do think that we could indeed use this as an additional in-page
pointer, but at least for this version was keeping things
backwards-compatible.  Peter G (I think) also made some good points
about how to include the various status bits on the page somehow in
terms of making a page completely self-contained.

> Re: patch contents
>
> 0001:
> >+ specialSize = MAXALIGN(specialSize) + reserved_page_size;
>
> This needs to be aligned, so MAXALIGN(specialSize + reserved_page_size), or an assertion that reserved_page_size is
MAXALIGNED,would be better. 

It is currently aligned via the space calculation return value but
agree that folding it into an assert or reworking it explicitly is
clearer.

> >  PageValidateSpecialPointer(Page page)
> >  {
> >      Assert(page);
> > -    Assert(((PageHeader) page)->pd_special <= BLCKSZ);
> > +    Assert((((PageHeader) page)->pd_special - reserved_page_size) <= BLCKSZ);
>
> This check is incorrect. With your code it would allow pd_special past the end of the block. If you want to put the
reserved_space_sizeeffectively inside the special area, this check should instead be: 
>
> +    Assert(((PageHeader) page)->pd_special <= (BLCKSZ - reserved_page_size));
>
> Or, equally valid
>
> +    Assert((((PageHeader) page)->pd_special + reserved_page_size) <= BLCKSZ);

Yup, I think I inverted my logic there; thanks.

> > + * +-------------+-----+------------+-----------------+
> > + * | ... tuple2 tuple1 | "special space" | "reserved" |
> > + * +-------------------+------------+-----------------+
>
> Could you fix the table display if / when you revise the patchset? It seems to me that the corners don't line up with
thecolumn borders. 

Sure thing.

> 0002:
> > Make the output of "select_views" test stable
> > Changing the reserved_page_size has resulted in non-stable results for this test.
>
> This makes sense, what kind of instability are we talking about? Are there different results for runs with the same
binary,or is this across compilations? 

When running with the same compilation/initdb settings, the test
results are stable, but differ depending what options you chose, so
`make installcheck` output will fail when testing a cluster with
different options vs upstream HEAD without these patches, etc.

> 0003 and up were not yet reviewed in depth.

Thanks, I appreciate the feedback so far.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Next
From: Justin Pryzby
Date:
Subject: Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))