Re: [COMMITTERS] pgsql: Make new crash restart test a bit more robust. - Mailing list pgsql-committers

From Tom Lane
Subject Re: [COMMITTERS] pgsql: Make new crash restart test a bit more robust.
Date
Msg-id 1136.1505854018@sss.pgh.pa.us
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Make new crash restart test a bit morerobust.  (Andres Freund <andres@anarazel.de>)
Responses Re: [COMMITTERS] pgsql: Make new crash restart test a bit morerobust.  (Andres Freund <andres@anarazel.de>)
Re: [COMMITTERS] pgsql: Make new crash restart test a bit morerobust.  (Andres Freund <andres@anarazel.de>)
List pgsql-committers
Andres Freund <andres@anarazel.de> writes:
> So this is geniuinely interesting. When the machine is really loaded (as
> in 6 animals running on a vm at the same time, incuding valgrind), psql
> sometimes doesn't get the WARNING message from a shutdown. Instead it
> gets
> # psql:<stdin>:3: server closed the connection unexpectedly
> #       This probably means the server terminated abnormally
> #       before or while processing the request.
> # psql:<stdin>:3: connection to server was lost

That seems pretty weird.  Maybe it's not the same case, but in

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2017-09-19%2020%3A10%3A02

you can see from the postmaster log that the backend *is* issuing
the message, or at least it's getting to the server log:

2017-09-19 20:20:34.476 UTC [6363] [unknown] LOG:  connection received: host=[local]
2017-09-19 20:20:34.477 UTC [6363] [unknown] LOG:  connection authorized: user=andres database=postgres
2017-09-19 20:20:34.478 UTC [6363] t/013_crash_restart.pl LOG:  statement: SELECT $$psql-connected$$;
...
2017-09-19 20:20:34.485 UTC [6363] t/013_crash_restart.pl WARNING:  terminating connection because of crash of another
serverprocess 
2017-09-19 20:20:34.485 UTC [6363] t/013_crash_restart.pl DETAIL:  The postmaster has commanded this server process to
rollback the current transaction and exit, because another server process exited abnormally and possibly corrupted
sharedmemory. 
2017-09-19 20:20:34.485 UTC [6363] t/013_crash_restart.pl HINT:  In a moment you should be able to reconnect to the
databaseand repeat your command. 

Have we forgotten an fflush() or something?

Also, maybe problem is on client side.  I vaguely recall a libpq bug
wherein it would complain about socket EOF even though data remained
to be processed.  Maybe we reintroduced something like that?

> We can obviously easily make the test accept both - but are we ok with
> the client sometimes not getting the message?

I'm not ...
        regards, tom lane


--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [COMMITTERS] pgsql: Speedup pgstat_report_activity by movingmb-aware truncation to
Next
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Speedup pgstat_report_activity by moving mb-aware truncation to