Home > mailing lists

Re: BUG #18559: Crash after detaching a partition concurrently from another session - Mailing list pgsql-bugs

From	Junwang Zhao
Subject	Re: BUG #18559: Crash after detaching a partition concurrently from another session
Date	August 10 16:54:58
Msg-id	CAEG8a3+qEsMW_R502GB9szReSGnhUvoNuvLci1wvw07+Lbegiw@mail.gmail.com Whole thread Raw
In response to	Re: BUG #18559: Crash after detaching a partition concurrently from another session (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Responses	Re: BUG #18559: Crash after detaching a partition concurrently from another session
List	pgsql-bugs

Tree view

On Tue, Jul 30, 2024 at 9:52 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
>
> On Tue, Jul 30, 2024 at 7:18 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> Adding Alvaro.
> >
> > The following code assumes that an pg_class entry for the detached partition
> > will always be available which is wrong.
> >
> Referring to the following code.
>         /*
>          * Two problems are possible here.  First, a concurrent ATTACH
>          * PARTITION might be in the process of adding a new partition, but
>          * the syscache doesn't have it, or its copy of it does not yet have
>          * its relpartbound set.  We cannot just AcceptInvalidationMessages(),
>          * because the other process might have already removed itself from
>          * the ProcArray but not yet added its invalidation messages to the
>          * shared queue.  We solve this problem by reading pg_class directly
>          * for the desired tuple.
>          *
>          * The other problem is that DETACH CONCURRENTLY is in the process of
>          * removing a partition, which happens in two steps: first it marks it
>          * as "detach pending", commits, then unsets relpartbound.  If
>          * find_inheritance_children_extended included that partition but we
>          * below we see that DETACH CONCURRENTLY has reset relpartbound for
>          * it, we'd see an inconsistent view.  (The inconsistency is seen
>          * because table_open below reads invalidation messages.)  We protect
>          * against this by retrying find_inheritance_children_extended().
>          */
>         if (boundspec == NULL)
>         {
>             Relation    pg_class;
>             SysScanDesc scan;
>             ScanKeyData key[1];
>             Datum       datum;
>             bool        isnull;
>
>             pg_class = table_open(RelationRelationId, AccessShareLock);
>             ScanKeyInit(&key[0],
>                         Anum_pg_class_oid,
>                         BTEqualStrategyNumber, F_OIDEQ,
>                         ObjectIdGetDatum(inhrelid));
>             scan = systable_beginscan(pg_class, ClassOidIndexId, true,
>                                       NULL, 1, key);
>             tuple = systable_getnext(scan);
>             datum = heap_getattr(tuple, Anum_pg_class_relpartbound,
>                                  RelationGetDescr(pg_class), &isnull);
>             if (!isnull)
>                 boundspec = stringToNode(TextDatumGetCString(datum));
>             systable_endscan(scan);
>             table_close(pg_class, AccessShareLock);
>
> IIUC, a simple fix would be to retry if an entry is not found. Attached a patch.

I can reproduce the issue, and the patch LGTM.

>
> --
> Thanks & Regards,
> Kuntal Ghosh



--
Regards
Junwang Zhao

pgsql-bugs by date:

From: Tomas Vondra
Date: 09 August, 22:55:00
Subject: Re: FDW INSERT batching can change behavior

From: "Sahu, Abhisek Kumar"
Date: 12 August, 08:37:50
Subject: RE: BUG #18569: Memory leak in Postgres Enterprise server

Re: BUG #18559: Crash after detaching a partition concurrently from another session - Mailing list pgsql-bugs

Previous

Next