Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Lowering the ever-growing heap->pd_lower |
Date | |
Msg-id | CAEze2WigqsCY=q0df_8jdNxeZrpdfpkUutw7dtx8SSHXHaiduw@mail.gmail.com Whole thread Raw |
In response to | Re: Lowering the ever-growing heap->pd_lower (Simon Riggs <simon.riggs@enterprisedb.com>) |
Responses |
Re: Lowering the ever-growing heap->pd_lower
|
List | pgsql-hackers |
On Tue, 3 Aug 2021 at 08:57, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Tue, 18 May 2021 at 20:33, Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Tue, May 18, 2021 at 12:29 PM Matthias van de Meent > > <boekewurm+postgres@gmail.com> wrote: > > > PFA the updated version of this patch. Apart from adding line pointer > > > truncation in PageRepairFragmentation (as in the earlier patches), I > > > also altered PageTruncateLinePointerArray to clean up all trailing > > > line pointers, even if it was the last item on the page. > > > > Can you show a practical benefit to this patch, such as an improvement > > in throughout or in efficiency for a given workload? > > > > It was easy to see that having something was better than having > > nothing at all. But things are of course different now that we have > > PageTruncateLinePointerArray(). > > There does seem to be utility in Matthias' patch, which currently does > two things: > 1. Allow same thing as PageTruncateLinePointerArray() during HOT cleanup > That is going to have a clear benefit for HOT workloads, which by > their nature will use a lot of line pointers. > Many applications are updated much more frequently than they are vacuumed. > Peter - what is your concern about doing this more frequently? Why > would we *not* do this? One clear reason as to why we _do_ want this, is that the current shrinking only happens in the second phase of vacuum. Shrinking the LP-array in heap_page_prune decreases the chance that tuples that could fit on the page due to removed HOT chain items don't currently fit on the page due to lack of vacuum, whilst adding only little overhead. Additionally, heap_page_prune is also executed if more empty space on the page is required for a new tuple that currently doesn't fit, and in such cases I think clearing as much space as possible is useful. > 2. Reduce number of line pointers to 0 in some cases. > Matthias - I don't think you've made a full case for doing this, nor > looked at the implications. I have looked at the implications (see upthread), and I haven't found any implications other than those mentioned below. > The comment clearly says "it seems like a good idea to avoid leaving a > PageIsEmpty()" page behind. Do note that that comment is based on (to the best of my knowledge) unmeasured, but somewhat informed, guesswork ('it seems like a good idea'), which I also commented on in the thread discussing the patch that resulted in that commit [0]. If I recall correctly, the decision to keep at least 1 line pointer on the page was because this feature was to be committed late in the development cycle of pg14, and as such there would be little time to check the impact of fully clearing pages. To go forward with the feature in pg14 at that point, it was safer to not completely empty pages, so that we'd not be changing the paths we were hitting during e.g. vacuum too significantly, reducing the chances on significant bugs that would require the patch to be reverted [1]. I agreed at that point that that was a safer bet, but right now it's early in the pg15 development cycle, and I've had the time to get more experience around the vacuum and line pointer machinery. That being the case, I consider this a re-visit of the topic 'is it OK to truncate the LP-array to 0', where previously the answer was 'we don't know, and it's late in the release cycle', and after looking through the code base now I argue that the answer is Yes. One more point for going to 0 is that for 32-bit systems, a single line pointer is enough to block a page from being 'empty' enough to fit a MaxHeapTupleSize-sized tuple (when requesting pages through the FSM). Additionally, there are some other optimizations we can only apply to empty pages: - vacuum (with disable_page_skipping = on) will process these empty pages faster, as it won't need to do any pruning on that page. With page skipping enabled this won't matter because empty pages are all_visible and therefore vacuum won't access that page. - the pgstattuple contrib extension processes emtpy pages (slightly) faster in pgstattuple_approx - various loops won't need to check the remaining item that it is unused, saving some cycles in those loops when the page is accessed. and further future optimizations might include - Full-page WAL logging of empty pages produced in the checkpointer could potentially be optimized to only log 'it's an empty page' instead of writing out the full 8kb page, which would help in reducing WAL volume. Previously this optimization would never be hit on heapam-pages because pages could not become empty again, but right now this has real potential for applying an optimization. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/CAEze2Wh-nXjkp0bLN_vQwgHttC8CRH%3D1ewcrWk%2B7RX5B93YQPQ%40mail.gmail.com [1] https://www.postgresql.org/message-id/CAH2-WznCxtWL4B995y2KJWj-%2BjrjahH4n6gD2R74SyQJo6Y63w%40mail.gmail.com
pgsql-hackers by date: