Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id 20220310004637.ksmawx5rbekdqav3@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-bugs
Hi,

On 2022-03-03 19:31:32 -0800, Peter Geoghegan wrote:
> Attached is a new revision of my fix. This is more or less a
> combination of my v4 fix from November 12 [1] and Andres'
> already-committed fix (commit 18b87b20), rebased on top of HEAD. This
> patch was originally a bugfix, but now it's technically just
> refactoring/hardening of the logic in pruneheap.c. It hasn't changed
> all that much, though.

Perhaps worth discussing outside of -bugs?


> We're still doing an up-front scan of the heap page to precalculate
> HTSV for each tuple (no change from commit 18b87b20), but we no longer
> do that on correctness grounds. It is purely a performance thing now.
> Note that I am making a working assumption that that still makes sense
> for now (I think that it probably does, but I haven't yet verified it
> for myself).

I'd be surprised if not.


> We now do "3 passes" over the page. The first is the aforementioned
> "precalculate HTSV" pass, the second is for determining the extent of
> HOT chains, and the third is for any remaining disconnected/orphaned
> HOT chains. I suppose that that's okay, since the amount of work has
> hardly increased in proportion to this "extra pass over the page". Not
> 100% sure about everything myself right now, though. I guess that "3
> passes" might be considered excessive.

We should be able to optimize away the third pass in the majority of cases, by
keeping track of the number of tuples visited or such. That seems like it
might be worth doing?


> +    /*
> +     * Start from the root item.  Mark it as valid up front, since root items
> +     * are always processed here (not as disconnected tuples in third pass
> +     * over page).
> +     */
> +    prstate->visited[rootoffnum] = true;
>      offnum = rootoffnum;
> +    nchain = 0;

I wonder if it'd be clearer if we dealt with redirects outside of the
loop. Would make it easier to assert that the target of a redirect may not be
unused / !heap-only?


> +
> +                /*
> +                 * Remember the offnum of the last DEAD tuple in this HOT
> +                 * chain.  To keep things simple, don't treat heap-only tuples
> +                 * from a HOT chain as DEAD unless they're only preceded by
> +                 * other DEAD tuples (in addition to actually being DEAD).

s/actually/themselves/?


> +                 * Remaining tuples that appear DEAD (but don't get treated as
> +                 * such by us) are from concurrently aborting updaters.

I don't understand this bit. A DEAD tuple following a RECENTLY_DEAD one won't
be removed now, and doesn't need to involve a concurrent abort? Are you
thinking of "remaining" as the tuples not referenced in the previous sentences?


> +                 * VACUUM will ask us to prune the heap page again when it
> +                 * sees that there is a DEAD tuple left behind, but that would
> +                 * be necessary regardless of our approach here.
> +                 */

Only as long as we do another set of HTSV calls...


>              case HEAPTUPLE_LIVE:
>              case HEAPTUPLE_INSERT_IN_PROGRESS:
> +                pastlatestdead = true;    /* no further DEAD tuples in CHAIN */

If we don't do anything to the following tuples, why don't we just abort here?
I assume it is because we'd then treat them as disconnected? That should
probably be called out...



> -    }
> -    else if (nchain < 2 && ItemIdIsRedirected(rootlp))
> -    {
> -        /*
> -         * We found a redirect item that doesn't point to a valid follow-on
> -         * item.  This can happen if the loop in heap_page_prune caused us to
> -         * visit the dead successor of a redirect item before visiting the
> -         * redirect item.  We can clean up by setting the redirect item to
> -         * DEAD state.
> -         */
> -        heap_prune_record_dead(prstate, rootoffnum);
> +
> +        return ndeleted;
>      }

Could there be such tuples from before a pg_upgrade? Do we need to deal with
them somehow?


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: PG Bug reporting form
Date:
Subject: BUG #17432: libraries are missing when installing postGIS in SLES15 SP3, some are part of SLES15 SP4 some not