Thread: [HACKERS] Unportable use of select for timeouts in PostgresNode.pm

[HACKERS] Unportable use of select for timeouts in PostgresNode.pm

From
Andrew Dunstan
Date:
I've been trying to get to the bottom of a nasty hang in buildfarm
member jacana when running the pg_ctl TAP test. This test used to work,
and was last known to work on June 22nd.

My attention has become focussed on this change in commit de3de0afd:
   -       # Wait a second before retrying.   -       sleep 1;   +       # Wait 0.1 second before retrying.   +
selectundef, undef, undef, 0.1;
 

This is a usage that is known not to work in Windows - IIRC we
eliminated such calls from our C programs at the time of the Windows
port - and it seems to me very likely to be the cause of the hang.
Instead I think we should use the usleep() function from the standard
(from 5.8) Perl module Time::HiRes, as recommended in the Perl docs for
the sleep() function for situations where you need finer grained
timeouts. I have verified that this works on jacana and friends.

Unless I hear objections I'll prepare a patch along those lines.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Unportable use of select for timeouts in PostgresNode.pm

From
Michael Paquier
Date:
On Mon, Jul 17, 2017 at 4:48 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> This is a usage that is known not to work in Windows - IIRC we
> eliminated such calls from our C programs at the time of the Windows
> port - and it seems to me very likely to be the cause of the hang.
> Instead I think we should use the usleep() function from the standard
> (from 5.8) Perl module Time::HiRes, as recommended in the Perl docs for
> the sleep() function for situations where you need finer grained
> timeouts. I have verified that this works on jacana and friends.

Looking at my boxes (Arch, Mac, Windows), Time::Hires looks to be part
of the core set of packages, so there is visibly no real need to
incorporate a check in configure.in. So +1 for doing as you suggest.
-- 
Michael



Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> I've been trying to get to the bottom of a nasty hang in buildfarm
> member jacana when running the pg_ctl TAP test. This test used to work,
> and was last known to work on June 22nd.

> My attention has become focussed on this change in commit de3de0afd:

>     -       # Wait a second before retrying.
>     -       sleep 1;
>     +       # Wait 0.1 second before retrying.
>     +       select undef, undef, undef, 0.1;

> This is a usage that is known not to work in Windows - IIRC we
> eliminated such calls from our C programs at the time of the Windows
> port - and it seems to me very likely to be the cause of the hang.

Ugh.

> Instead I think we should use the usleep() function from the standard
> (from 5.8) Perl module Time::HiRes, as recommended in the Perl docs for
> the sleep() function for situations where you need finer grained
> timeouts. I have verified that this works on jacana and friends.

> Unless I hear objections I'll prepare a patch along those lines.

WFM.  Thanks for taking care of it.
        regards, tom lane