ci/cfbot: run windows tests under a timeout - Mailing list pgsql-hackers

From Andres Freund
Subject ci/cfbot: run windows tests under a timeout
Date
Msg-id 20220202183107.pb3jl5qg33ik6iii@alap3.anarazel.de
Whole thread Raw
Responses Re: ci/cfbot: run windows tests under a timeout  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On windows cfbot currently regularly hangs / times out. Presumably this is due
to the issues discussed in
https://postgr.es/m/CA%2BhUKG%2BG5DUNJfdE-qusq5pcj6omYTuWmmFuxCvs%3Dq1jNjkKKA%40mail.gmail.com
which lead to reverting [1] some networking related changes everywhere but
master.

But it's hard to tell - because the entire test task times out, we don't get
to see debugging information.

In earlier versions of the CI script I had tests run under a timeout command,
that killed the entire test run. I found that to be helpful when working on
AIO. But I removed that, in an attempt to simplify things, before
submitting. Turns out it was needed complexity.

The attached test adds a timeout (using git's timeout binary) to all vcregress
invocations. I've not re-added it to the other OSs, but I'm on the fence about
doing so.

The diff is a bit larger than one might think necessary: Yaml doesn't like % -
from the windows command variable syntax - at the start of an unquoted
string...


Separately, we should probably make Cluster.pm::psql() etc always use a
"fallback" timeout (rather than just when the test writer thought it's
necessary). Or perhaps Utils.pm's INIT should set up a timer after which an
individual test is terminated?


Greetings,

Andres Freund

[1]
commit 75674c7ec1b1607e7013b5cebcb22d9c8b4b2cb6
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2022-01-25 12:17:40 -0500

    Revert "graceful shutdown" changes for Windows, in back branches only.

    This reverts commits 6051857fc and ed52c3707, but only in the back
    branches.  Further testing has shown that while those changes do fix
    some things, they also break others; in particular, it looks like
    walreceivers fail to detect walsender-initiated connection close
    reliably if the walsender shuts down this way.  We'll keep trying to
    improve matters in HEAD, but it now seems unwise to push these changes
    into stable releases.

    Discussion: https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: 2022-01 Commitfest
Next
From: Nathan Bossart
Date:
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work