Re: Why is src/test/modules/committs/t/002_standby.pl flaky? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date
Msg-id 2282783.1644684440@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why is src/test/modules/committs/t/002_standby.pl flaky?  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: Why is src/test/modules/committs/t/002_standby.pl flaky?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Alexander Lakhin <exclusion@gmail.com> writes:
> 11.02.2022 05:22, Andres Freund wrote:
>> Over in another thread I made some wild unsubstantiated guesses that the
>> windows issues could have been made much more likely by a somewhat odd bit of
>> code in PQisBusy():
>> https://postgr.es/m/1959196.1644544971%40sss.pgh.pa.us
>> Alexander, any chance you'd try if that changes the likelihood of the problem
>> occurring, without any other fixes / reverts applied?

> Unfortunately I haven't seen an improvement for the test in question.

Yeah, that's what I expected, sadly.  While I think this PQisBusy behavior
is definitely a bug, it will not lead to an infinite loop, just to write
failures being reported in a less convenient fashion than intended.

I wonder whether it would help to put a PQconsumeInput call *before*
the PQisBusy loop, so that any pre-existing EOF condition will be
detected.  If you don't like duplicating code, we could restructure
the loop as

    for (;;)
    {
        int            rc;

        /* Consume whatever data is available from the socket */
        if (PQconsumeInput(streamConn) == 0)
        {
            /* trouble; return NULL */
            return NULL;
        }

        /* Done? */
        if (!PQisBusy(streamConn))
            break;

        /* Wait for more data */
        rc = WaitLatchOrSocket(MyLatch,
                               WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
                               WL_LATCH_SET,
                               PQsocket(streamConn),
                               0,
                               WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);

        /* Interrupted? */
        if (rc & WL_LATCH_SET)
        {
            ResetLatch(MyLatch);
            ProcessWalRcvInterrupts();
        }
    }

    /* Now we can collect and return the next PGresult */
    return PQgetResult(streamConn);


In combination with the PQisBusy fix, this might actually help ...

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Add TAP test to automate the equivalent of check_guc
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Add TAP test to automate the equivalent of check_guc