Hi,
On 2019-01-26 20:53:48 -0500, Tom Lane wrote:
> Recently, buildfarm member curculio has started to show a semi-repeatable
> failure in src/test/recovery/t/013_crash_restart.pl:
>
> # aborting wait: program died
> # stream contents: >>psql:<stdin>:8: no connection to the server
> # psql:<stdin>:8: connection to server was lost
> # <<
> # pattern searched for: (?^m:server closed the connection unexpectedly)
>
> # Failed test 'psql query died successfully after SIGKILL'
> # at t/013_crash_restart.pl line 198.
>
> The message this test is looking for is what libpq reports upon getting
> EOF or ECONNRESET from a socket read attempt. The message it's actually
> seeing is what libpq reports if it notices that the PQconn is *already*
> in CONNECTION_BAD state when it's trying to send a new query.
>
> I have no idea why we're seeing this in only one buildfarm member
> and only for the past week or so, as it doesn't appear that any
> related code has changed for months. (Perhaps something changed
> about curculio's host?)
I have no idea why it's just curculio, but I think I know why it only
started recently: Curculio doesn't appear to have tap tests enabled
before
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2019-01-17%2021%3A30%3A02
> just change the test script to accept either message as a successful
> result. I think that 4247db625 made such races more likely, but I
> don't believe it was impossible before.
Sounds right to me - do you want to do the honors or shall I?
> Another idea is to change libpq so that both these cases emit identical
> messages, but I don't really feel that that'd be an improvement. Also,
> since 4247db625 was back-patched, we'd have to back-patch the message
> change as well, which I like even less. People might be relying on
> seeing either message spelling in some situations.
Yea, I don't think that's the way to go.
Greetings,
Andres Freund