On 10/15/21 10:46 AM, Andrew Dunstan wrote:
> On 10/14/21 5:52 PM, Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> Yes, that's been puzzling me too. I've just been staring at it again and
>>> nothing jumps out. But maybe we can investigate that offline if this
>>> test is deemed not worth keeping.
>> As Mark says, it'd be interesting to know whether the use of
>> background_psql is related, because if it is, we'd want to debug that.
>> (I don't really see how it could be related, but maybe I just lack
>> sufficient imagination today.)
>
>
> Yeah. I'm working on  getting a cut-down reproducible failure case.
>
I spend a good deal of time poking at this on Friday and Saturday.
It's quite clear that the use of
    my $h = $node->background_psql(...);
    $h->pump_nb;
is the root of the problem.
If that code is commented out, or even just moved to just after the
standby is started and before we check that replication has caught up
(which should meet the needs of the case where we found this), then the
problem goes away.
IPC::Run deals with this setup in a different way on Windows, mainly
because its select() only works on sockets and not other types of file
handles.
It does appear that TestLib::get_free_port() is not sufficiently robust,
as it should guarantee that the port/address can be bound.
I haven't got further that that, and I have other things I need to be
doing, but for now I think we just need to be careful wherever possible
to try to set up servers before trying to calling start/pump.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com