Re: [BUGS] Backend crash on non-exclusive backup cancel - Mailing list pgsql-bugs
From | David Steele |
---|---|
Subject | Re: [BUGS] Backend crash on non-exclusive backup cancel |
Date | |
Msg-id | 6a8f9d7e-ce4e-d850-12d3-7787815447ea@pgmasters.net Whole thread Raw |
In response to | Re: [BUGS] Backend crash on non-exclusive backup cancel (Michael Paquier <michael.paquier@gmail.com>) |
List | pgsql-bugs |
On 2/28/17 9:08 PM, Michael Paquier wrote: > On Tue, Feb 28, 2017 at 10:21 PM, David Steele <david@pgmasters.net> wrote: >> On 2/27/17 10:05 PM, Michael Paquier wrote: >>> That's not necessarily true, I can see a stop backup able to finish as >>> well by issuing a cancel request. It seems to me that we just need to >>> have the shmem information updated at the same time as the >>> session-level switches for consistency and we're good. The >>> inconsistency in places when updating the session-level flags and the >>> shmem-level flags is what is causing harm. >> >> I'm sure this could be done but it will require quite a bit of >> refactoring and I'm not sure that it's worth it. In my mind it would be >> enough to document that cancelled backups should be considered invalid >> whether the stop backup record was written or not. However, I'm willing >> to go with the majority opinion. > > I would think that addressing the problem is the way to go, hitting an > assertion in this code path means that we are doing something wrong so > simply removing it does not sound like a correct answer to me. > >>>> If that makes sense I'm happy to work up a patch. This is definitely an >>>> edge case and I seriously doubt it is causing any issues in the field. >>> >>> <...> >>> >>> David, are you willing to write a patch or should I? Changing >>> nonexclusive_backup_running/exclusive_backup_running to be an enum >>> would make the code more readable as well IMO. >> >> I would like to see if anyone else weighs in on this first, but yes I am >> planning to write a patch. Agreed on the enum. > > This was itching me yesterday so I wrote a patch able to address this > problem... I know that feeling. > At least it can be registered to the CF on time and give it > more visibility. Indeed. > By the way, I have noticed a second bug in the > current logic while looking at the problem you have reported: > 1) Start backup in session 1: > =# select pg_start_backup('popo'); > pg_start_backup > ----------------- > 0/2000028 > (1 row) > 2) pg_stop_backup() in session 2 > 3) And now when trying to work on backups with session 1: > =# select pg_start_backup('popo'); > ERROR: 55000: a backup is already in progress in this session > LOCATION: pg_start_backup, xlogfuncs.c:87 > =# select pg_stop_backup(); > ERROR: 55000: exclusive backup not in progress > LOCATION: do_pg_stop_backup, xlog.c:10642 Yeah, that doesn't look good. > So the handling around exclusive_backup_running is broken now as it is > possible to lock a session easily when handling backups. And the root > of the problem is that checks on exclusive_backup_running are not > necessary. I have fixed as well this problem as per the attached. Note > however that SESSION_BACKUP_EXCLUSIVE still makes sense in the patch > attached when doing checks with pg_stop_backup_v2() for a > non-exclusive backup run. It is also useful for debugging. Excellent! I'll have a look, and thanks for working up a patch. -- -David david@pgmasters.net -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: