Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition - Mailing list pgsql-bugs

From Amit Langote
Subject Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
Date
Msg-id CA+HiwqFt2oAc9b12ba32H1_TULpShmHz26V1W3TiAFzpeBf1jA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition  (Etsuro Fujita <etsuro.fujita@gmail.com>)
List pgsql-bugs
On Tue, Feb 1, 2022 at 5:58 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Hi Alexander,
>
> On Tue, Feb 1, 2022 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> > 30.01.2022 13:59, Etsuro Fujita wrote:
> > >> (Besides that I've observed an infinite waiting for ShareLock with
> > >> step "s1i" { INSERT INTO pt VALUES (2000); }
> > >> This doesn't happen with a regular (not foreign) table.)
> > > You mean the lock wait occurs on the remote side, not on the local
> > > side?  If so, I think that that is expected behavior because a write
> > > conflict occurs on the remote side in that case.  Maybe I don’t fully
> > > understand your words, so could you elaborate a bit more on your
> > > observation?
> > Yes, you are right, that was expected behavior. I didn't realize that
> > the isolationtester itself resolves blocking when the target table is local.
> > The isolationtester controls the step execution using
> > pg_isolation_test_session_is_blocked(), but when the target table is
> > foreign, it can not determine correctly whether one step blocking other
> > (cause it checks local, not remote session pids) and just hangs.
>
> Ok, thanks for the explanation!
>
> Attached is an updated patch.  I tweaked a comment a little bit and
> added the commit message.  I didn’t include your test because in my
> understanding we don’t add such a test into the postgres_fdw
> regression test.  I’ll commit the patch if there are no objections
> from you (or anyone else).

Thank you for working on this and sorry about failing to reply to this sooner.

Your proposed fix makes sense.

I was trying to understand in what way commit 86dc90056's is related
to this, because you mention it here in the thread and also in the
commit message.  I see that c3928b467, also mentioned in your commit
message, fixed a related bug [1], which in turn refers to 1375422c as
the cause [2].  c3928b467 fixed things so that BeginDirectModify()
won't get called if EPQ is active, so fdw_state won't get set.  Given
that, it also fixed ForeignNext() to not enter the FDW to perform a
non-SELECT operation in the EPQ-active case, which made sure its
caller Exec[Foreign]Scan() won't enter the FDW in that case.  Though
it didn't also fix ExecReScan[ForeignScan](), so this bug report.
Your patch fixes both ExecForeignScan() and ExecReScanForeignScan() to
not enter the FDW during EvalPlanQual() in non-SELECT cases.

Before commit 86dc90056, EPQState.plan would be set to the 1st child
subplan, which if it happened to be a direct-modify ForeignScan node,
this bug would manifest.  After that commit though, a direct-modify
child subplan would *always* be present in EPQState.plan by way of its
parent Append node or some parent node thereof being set as
EPQState.plan, so the bug would now always manifest.  So my
understanding is that the commit 86dc90056 only changed *when* the bug
manifests, which is now "always" if direct-modify children are
present.  Is that understanding correct?

BTW, isn't the following code in ForeignNext() added by c3928b467
non-reachable after your patch:

        /*
         * direct modifications cannot be re-evaluated, so shouldn't get here
         * during EvalPlanQual processing
         */
        if (estate->es_epq_active != NULL)
            elog(ERROR, "cannot re-evaluate a Foreign Update or Delete
during EvalPlanQual");

Should that be converted to an Assert(estate->es_epq_active == NULL)?

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] EvalPlanQual() attempting to initialize a direct-modify
ForeignScan node without the necessary ResultRelInfo having been set
up.

[2] As of 1375422c, ResultRelInfos are initialized in
ExecInitModifyTable() which won't be invoked during
EvalPlanQualStart(), so EvalPlanQual() can no longer rely on accessing
a direct-modify ForeignScan node's ResultRelInfo.



pgsql-bugs by date:

Previous
From: Ben Chobot
Date:
Subject: Re: BUG #17389: pg_repack creates race conditions on streaming replicas
Next
From: Maxim Boguk
Date:
Subject: Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table