Re: Race condition in crash-recovery tests - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Race condition in crash-recovery tests
Date
Msg-id 20190127022937.nvocrvsok7nlp4vt@alap3.anarazel.de
Whole thread Raw
In response to Race condition in crash-recovery tests  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race condition in crash-recovery tests  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2019-01-26 20:53:48 -0500, Tom Lane wrote:
> Recently, buildfarm member curculio has started to show a semi-repeatable
> failure in src/test/recovery/t/013_crash_restart.pl:
> 
> # aborting wait: program died
> # stream contents: >>psql:<stdin>:8: no connection to the server
> # psql:<stdin>:8: connection to server was lost
> # <<
> # pattern searched for: (?^m:server closed the connection unexpectedly)
> 
> #   Failed test 'psql query died successfully after SIGKILL'
> #   at t/013_crash_restart.pl line 198.
> 
> The message this test is looking for is what libpq reports upon getting
> EOF or ECONNRESET from a socket read attempt.  The message it's actually
> seeing is what libpq reports if it notices that the PQconn is *already*
> in CONNECTION_BAD state when it's trying to send a new query.
> 
> I have no idea why we're seeing this in only one buildfarm member
> and only for the past week or so, as it doesn't appear that any
> related code has changed for months.  (Perhaps something changed
> about curculio's host?)

I have no idea why it's just curculio, but I think I know why it only
started recently: Curculio doesn't appear to have tap tests enabled
before
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2019-01-17%2021%3A30%3A02


> just change the test script to accept either message as a successful
> result.  I think that 4247db625 made such races more likely, but I
> don't believe it was impossible before.

Sounds right to me - do you want to do the honors or shall I?


> Another idea is to change libpq so that both these cases emit identical
> messages, but I don't really feel that that'd be an improvement.  Also,
> since 4247db625 was back-patched, we'd have to back-patch the message
> change as well, which I like even less.  People might be relying on
> seeing either message spelling in some situations.

Yea, I don't think that's the way to go.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Race condition in crash-recovery tests
Next
From: Andres Freund
Date:
Subject: Re: Variable-length FunctionCallInfoData