Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Lowering the ever-growing heap->pd_lower
Date
Msg-id CAEze2Wghu_TCB5FxWCsKD+T9y44ZHOYqo_aeoWpk4ZRNFciVKQ@mail.gmail.com
Whole thread Raw
In response to Re: Lowering the ever-growing heap->pd_lower  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, 4 Aug 2021 at 02:43, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Aug 2, 2021 at 11:57 PM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > 2. Reduce number of line pointers to 0 in some cases.
> > Matthias - I don't think you've made a full case for doing this, nor
> > looked at the implications.
> > The comment clearly says "it seems like a good idea to avoid leaving a
> > PageIsEmpty()" page behind.
> > So I would be inclined to remove that from the patch and consider that
> > as a separate issue, or close this.
>
> This was part of that earlier commit because of sheer paranoia;
> nothing more. I actually think that it's highly unlikely to protect us
> from bugs in practice. Though I am, in a certain sense, likely to be
> wrong about "PageIsEmpty() defensiveness", it does not bother me in
> the slightest. It seems like the right approach in the absence of new
> information about a significant downside. If my paranoia really did
> turn out to be justified, then I would expect that there'd be a
> subtle, nasty bug. That possibility is what I was really thinking of.
> And so it almost doesn't matter to me how unlikely we might think such
> a bug is now, unless and until somebody can demonstrate a real
> practical downside to my defensive approach.

As I believe I have mentioned before, there is one significant
downside: 32-bit systems cannot reuse pages that contain only a
singular unused line pointer for max-sized FSM-requests. A fresh page
has 8168 bytes free (8kB - 24B page header), which then becomes 8164
when returned from PageGetFreeSpace (it acocunts for space used by the
line pointer when inserting items onto the page).

On 64-bit systems, MaxHeapTupleSize is 8160, and for for 32-bit
systems the MaxHeapTupleSize is 8164. When we leave one more unused
line pointer on the page, this means the page will have a
PageGetFreeSpace of 8160, 4 bytes less than the MaxHeapTupleSize on
32-bit systems. As such, there will never be FSM entries of the
largest category for pages that have had data on those systems, and as
such, those systems will need to add pages for each request of the
largest category, meaning that all tuples larger than 8128 bytes
(largest request that would request the 254-category) will always be
put on a new page, regardless of the actual availability of free space
in the table.

You might argue that this is a problem in the FSM subsystem, but in
this case it actively hinders us from reusing pages in the largest
category of FSM-requests. If you would argue 'PageGetHeapFreeSpace
should keep free line pointers in mind when calculating free space',
then I would argue 'yes, but isn't it better then to also actually
fully mark that space as unused'.

All in all, I'd just rather remove the distinction between once-used
pages and fresh pages completely by truncating the LP-array to 0 than
to leave this bloating behaviour in the system.

Kind regards,

Matthias van de Meent.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Masahiko Sawada
Date:
Subject: Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION