Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Lowering the ever-growing heap->pd_lower
Date
Msg-id CAEze2WigqsCY=q0df_8jdNxeZrpdfpkUutw7dtx8SSHXHaiduw@mail.gmail.com
Whole thread Raw
In response to Re: Lowering the ever-growing heap->pd_lower  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: Lowering the ever-growing heap->pd_lower
List pgsql-hackers
On Tue, 3 Aug 2021 at 08:57, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Tue, 18 May 2021 at 20:33, Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Tue, May 18, 2021 at 12:29 PM Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> > > PFA the updated version of this patch. Apart from adding line pointer
> > > truncation in PageRepairFragmentation (as in the earlier patches), I
> > > also altered PageTruncateLinePointerArray to clean up all trailing
> > > line pointers, even if it was the last item on the page.
> >
> > Can you show a practical benefit to this patch, such as an improvement
> > in throughout or in efficiency for a given workload?
> >
> > It was easy to see that having something was better than having
> > nothing at all. But things are of course different now that we have
> > PageTruncateLinePointerArray().
>
> There does seem to be utility in Matthias' patch, which currently does
> two things:
> 1. Allow same thing as PageTruncateLinePointerArray() during HOT cleanup
> That is going to have a clear benefit for HOT workloads, which by
> their nature will use a lot of line pointers.
> Many applications are updated much more frequently than they are vacuumed.
> Peter - what is your concern about doing this more frequently? Why
> would we *not* do this?

One clear reason as to why we _do_ want this, is that the current
shrinking only happens in the second phase of vacuum. Shrinking the
LP-array in heap_page_prune decreases the chance that tuples that
could fit on the page due to removed HOT chain items don't currently
fit on the page due to lack of vacuum, whilst adding only little
overhead. Additionally, heap_page_prune is also executed if more empty
space on the page is required for a new tuple that currently doesn't
fit, and in such cases I think clearing as much space as possible is
useful.

> 2. Reduce number of line pointers to 0 in some cases.
> Matthias - I don't think you've made a full case for doing this, nor
> looked at the implications.

I have looked at the implications (see upthread), and I haven't found
any implications other than those mentioned below.

> The comment clearly says "it seems like a good idea to avoid leaving a
> PageIsEmpty()" page behind.

Do note that that comment is based on (to the best of my knowledge)
unmeasured, but somewhat informed, guesswork ('it seems like a good
idea'), which I also commented on in the thread discussing the patch
that resulted in that commit [0].

If I recall correctly, the decision to keep at least 1 line pointer on
the page was because this feature was to be committed late in the
development cycle of pg14, and as such there would be little time to
check the impact of fully clearing pages. To go forward with the
feature in pg14 at that point, it was safer to not completely empty
pages, so that we'd not be changing the paths we were hitting during
e.g. vacuum too significantly, reducing the chances on significant
bugs that would require the patch to be reverted [1].


I agreed at that point that that was a safer bet, but right now it's
early in the pg15 development cycle, and I've had the time to get more
experience around the vacuum and line pointer machinery. That being
the case, I consider this a re-visit of the topic 'is it OK to
truncate the LP-array to 0', where previously the answer was 'we don't
know, and it's late in the release cycle', and after looking through
the code base now I argue that the answer is Yes.

One more point for going to 0 is that for 32-bit systems, a single
line pointer is enough to block a page from being 'empty' enough to
fit a MaxHeapTupleSize-sized tuple (when requesting pages through the
FSM).

Additionally, there are some other optimizations we can only apply to
empty pages:

- vacuum (with disable_page_skipping = on) will process these empty
pages faster, as it won't need to do any pruning on that page. With
page skipping enabled this won't matter because empty pages are
all_visible and therefore vacuum won't access that page.
- the pgstattuple contrib extension processes emtpy pages (slightly)
faster in pgstattuple_approx
- various loops won't need to check the remaining item that it is
unused, saving some cycles in those loops when the page is accessed.

and further future optimizations might include

- Full-page WAL logging of empty pages produced in the checkpointer
could potentially be optimized to only log 'it's an empty page'
instead of writing out the full 8kb page, which would help in reducing
WAL volume. Previously this optimization would never be hit on
heapam-pages because pages could not become empty again, but right now
this has real potential for applying an optimization.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CAEze2Wh-nXjkp0bLN_vQwgHttC8CRH%3D1ewcrWk%2B7RX5B93YQPQ%40mail.gmail.com
[1] https://www.postgresql.org/message-id/CAH2-WznCxtWL4B995y2KJWj-%2BjrjahH4n6gD2R74SyQJo6Y63w%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Commitfest overflow
Next
From: Bruce Momjian
Date:
Subject: Re: Have I found an interval arithmetic bug?