Re: Hardening heap pruning code (was: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum) - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Hardening heap pruning code (was: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum)
Date
Msg-id CAEze2WgRvbr8c1RMfqjM1wyXn7K4_+WOvG8PHs6B95CdoOzqCg@mail.gmail.com
Whole thread Raw
In response to Hardening heap pruning code (was: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sun, 20 Mar 2022 at 04:48, Peter Geoghegan <pg@bowt.ie> wrote:
>
> Attached is v6, which goes a bit further than v5 in using local state
> that we build up-front to describe the state of the page being pruned
> (rather than rereading the page itself).

I didn't test the code; so these comments are my general feel of this
patch based on visual analysis only.

> > > We now do "3 passes" over the page. The first is the aforementioned
> > > "precalculate HTSV" pass, the second is for determining the extent of
> > > HOT chains, and the third is for any remaining disconnected/orphaned
> > > HOT chains. I suppose that that's okay, since the amount of work has
> > > hardly increased in proportion to this "extra pass over the page". Not
> > > 100% sure about everything myself right now, though. I guess that "3
> > > passes" might be considered excessive.

The passes don't all have a very clear explanation what they do; or
what they leave for the next one to process. It can be distilled from
the code comments at the respective phases and various functions, but
only after careful review of all code comments; there doesn't seem to
be a clear overview.


> @@ -295,30 +305,25 @@ heap_page_prune(Relation relation, Buffer buffer,
> [...]
> +     * prstate for later passes.  Scan the page backwards (in reverse item
> +     * offset number order).
> - [...]
> +     * This approach is good for performance.  Most commonly tuples within a
> +     * page are stored at decreasing offsets (while the items are stored at
> +     * increasing offsets).

A reference to why this is commonly the case (i.e.
PageRepairFragmentation, compactify_tuples, natural insertion order)
would probably help make the case; as this order being common is not
specifically obvious at first sight.

> @@ -350,28 +370,41 @@ heap_page_prune(Relation relation, Buffer buffer,
> +    /* Now scan the page a second time to process each root item */

This second pass also processes the HOT chain of each root item; but
that is not clear from the comment on the loop. I'd suggest a comment
more along these lines:

/*
 * Now scan the page a second time to process each valid HOT chain;
 * i.e. each non-HOT tuple or redirect line pointer and the HOT tuples in
 * the trailing chain, if any.
 * heap_prune_from_root marks all items in the HOT chain as visited;
 * so that phase 3 knows to skip those items.
 */

Apart from changes in comments for extra clarity; I think this
materially improves pruning, so thanks for working on this.

-Matthias



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Tab completion for SET TimeZone
Next
From: Peter Geoghegan
Date:
Subject: Re: Patch proposal - parameter to limit amount of FPW because of hint bits per second