[BUGS] Backend crash on non-exclusive backup cancel - Mailing list pgsql-bugs

From David Steele
Subject [BUGS] Backend crash on non-exclusive backup cancel
Date
Msg-id c86627c3-fcdd-9b67-5a03-e2f1113d1b14@pgmasters.net
Whole thread Raw
Responses Re: [BUGS] Backend crash on non-exclusive backup cancel  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
I found this issue while working on a pg_stop_backup() patch.  If a
non-exclusive pg_stop_backup() is cancelled and then attempted again the
backend will crash on assertion:

$ test/pg/bin/psql
psql (10devel)
Type "help" for help.

postgres=# select * from pg_start_backup('label', true, false);
 pg_start_backup
-----------------
 0/2000028
(1 row)

postgres=# select * from pg_stop_backup(false);
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=# select * from pg_stop_backup(false);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

From the server log:

2017-02-28 01:21:34.755 UTC STATEMENT:  select * from pg_stop_backup(false);
TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)",
File: "/postgres/src/backend/access/transam/xlog.c", Line: 10723)

This error was produced in master at 30df93f.  Configure settings are
--enable-cassert --enable-tap-tests --with-openssl.

Disabling assertions "works", but there is still a problem.  A backend
that keeps cancelling pg_stop_backup() without ever resetting the
exclusive flag in xlogfunc.c can decrement the the shared variable
XLogCtl->Insert.nonExclusiveBackups as many times as it wants.  As far
as I can see the worst that will happen is that
XLogCtl->Insert.forcePageWrites won't get set back to false, but that's
still a bug.

This condition should throw "backup is not in progress" just as a
exclusive backup would, whether assertions are enabled or not.

I believe the solution is to move the exclusive flag to xlog.c and only
decrement XLogCtl->Insert.nonExclusiveBackups when exclusive is true,
otherwise return an error.  Even then, it wouldn't be clear if the
backup had completed or not.  I suppose any cancelled non-exclusive
pg_stop_backup() should be considered aborted whether a stop backup
record was written or not?

If that makes sense I'm happy to work up a patch.  This is definitely an
edge case and I seriously doubt it is causing any issues in the field.

-- 
-David
david@pgmasters.net


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14543: libpq fails with group readable ssl keys
Next
From: Michael Paquier
Date:
Subject: Re: [BUGS] Backend crash on non-exclusive backup cancel