Mark Dilger <hornschnorter@gmail.com> writes:
> On 12/2/19 11:42 AM, Andrew Dunstan wrote:
>> On 12/2/19 11:23 AM, Tom Lane wrote:
>>> I'm a little baffled as to what this might be --- some sort of
>>> timing problem in our Windows signal emulation, perhaps? But
>>> if so, why haven't we found it years ago?
> I would be curious to see if there is a race condition in
> src/test/isolation/isolationtester.c between the loop starting
> on line 820:
> while ((res = PQgetResult(conn)))
> {
> ...
> }
> and the attempt to consume input that might include NOTIFY
> messages on line 861:
> PQconsumeInput(conn);
In principle, the issue should not be there, because commits
790026972 et al should have ensured that the NOTIFY protocol
message comes out before ReadyForQuery (and thus, libpq will
absorb it before PQgetResult will return NULL). I think the
timing problem --- if that's what it is --- must be on the
backend side; somehow the backend is not processing the
inbound notify queue before it goes idle.
Hmm ... just looking at the code again, could it be that there's
no well-placed CHECK_FOR_INTERRUPTS? Andrew, could you see if
injecting one in what 790026972 added to postgres.c helps?
That is,
/*
* Also process incoming notifies, if any. This is mostly to
* ensure stable behavior in tests: if any notifies were
* received during the just-finished transaction, they'll be
* seen by the client before ReadyForQuery is.
*/
+ CHECK_FOR_INTERRUPTS();
if (notifyInterruptPending)
ProcessNotifyInterrupt();
regards, tom lane