Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti
Date
Msg-id 2138662.1623460441@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Jun 11, 2021 at 08:31:30PM -0400, Tom Lane wrote:
>> AFAICS, none of the other callers of sub reload are doing anything
>> special to wait for it to take effect.  Not sure why this one
>> would need more protection.  If we do need a fix for that, likely
>> sub reload itself had better do it.

> Perhaps.

I looked around a bit, and I think that we don't need to sweat
about this.  Certainly we've not identified any buildfarm failures
related to reload not happening fast enough.  I think the sequence
of events is:

* pg_ctl will deliver the SIGHUP signal to the postmaster before
returning.

* Although surely the postmaster might not process that signal
instantly, we can expect that it will do so before it accepts
another connection request.  Since the tests cannot detect
whether the reload has happened without issuing a new command
(which, in all these tests, involves launching psql or pgbench
or pg_basebackup or what-have-you, which needs a new connection),
the reload will appear to have happened so far as the postmaster
itself is concerned, and therefore also in any new backend it
spawns.

* This does seem to leave in question whether a pre-existing
background process such as a logrep worker will get the word fast
enough.  However, I think the same principle applies there.  The
postmaster will have turned around and signaled the worker process
while it processes the SIGHUP for itself.  While that again doesn't
ensure instantaneous update of the worker's state, we can expect
that it will notice the SIGHUP before noticing any newly-arrived
replication data, which is Good Enough.  This argument does assume
that we know the worker is idle when we issue the reload ... but
we already waited for it to be idle.

Perhaps the buildfarm will prove me wrong, but I think we don't
have to do anything.  And if we did want to do something, what
would it be?  There is no feedback mechanism to let us observe
when the worker processed the signal.

>> This coding is also stolen verbatim from other test scripts.
>> What is your concern about it exactly?

> My main concern are future changes in this test suite that could cause
> more queries to generate logs that overlap with this sequence of
> tests, falsifying what this test is checking.

Oh, I see, you wonder if there could already be entries like these
in the file.  But since we only turned the log level up for the
duration of the specific test, it doesn't seem like a problem.
(I did think of this to the extent of making sure the second test
group in 013_partition.pl looks for different messages than the
first group does.)

            regards, tom lane



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17055: Logical replication worker crashes when applying update of row that dose not exist in target partiti