Thread: Re: [HACKERS] Backend crash on non-exclusive backup cancel

Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Anastasia Lubennikova
Date:
As far as I understand, in this thread were discussed two bugs of pg_stop_backup().
Thanks to the clear descriptions above, I easily reproduced both of them.

BUG#1:
Server crashes on assertion on call of pg_stop_backup(false) after interrupted call of pg_stop_backup(false).
TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)", File: "xlog.c", Line: 10747)

BUG#2:
Failure to start an exclusive backup with the same name, if previous exclusive backup was stopped in another session.

Both problems seem to be fixed with patch "backup-session-locks-fixes.patch".
Speaking of the patch itself, I have a question: shouldn't we also update sessionBackupState
in pg_stop_backup_callback() along with XLogCtl->Insert.exclusiveBackupState?

And couple of minor notes:
1) + * Routines to starting stop, and get status of a base backup
Probably should be: + * Routines to start, stop and get status of a base backup
And also this comment should be moved below the enum.

2) This is used in parallel of the shared memory status 
s/ in parallel of/ in parallel with


The new status of this patch is: Waiting on Author

Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Michael Paquier
Date:
On Wed, Mar 15, 2017 at 12:27 AM, Anastasia Lubennikova
<lubennikovaav@gmail.com> wrote:
> As far as I understand, in this thread were discussed two bugs of pg_stop_backup().
> Thanks to the clear descriptions above, I easily reproduced both of them.
>
> BUG#1:
> Server crashes on assertion on call of pg_stop_backup(false) after interrupted call of pg_stop_backup(false).
> TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)", File: "xlog.c", Line: 10747)
>
> BUG#2:
> Failure to start an exclusive backup with the same name, if previous exclusive backup was stopped in another
session.
>
> Both problems seem to be fixed with patch "backup-session-locks-fixes.patch".
> Speaking of the patch itself, I have a question: shouldn't we also update sessionBackupState
> in pg_stop_backup_callback() along with XLogCtl->Insert.exclusiveBackupState?

No, that's not necessary. sessionBackupState is not changed until the
code path where pg_stop_backup_callback() is active, and remains
unchanged until it gets deactivated.

> And couple of minor notes:
> 1) + * Routines to starting stop, and get status of a base backup
> Probably should be: + * Routines to start, stop and get status of a base backup
> And also this comment should be moved below the enum.
>
> 2) This is used in parallel of the shared memory status
> s/ in parallel of/ in parallel with

Agreed on both points.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Anastasia Lubennikova
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

As I see, this bugfix works as expected and the patch is small and clear,
so I marked it "Ready for committer".
Anyway, it would be great if David could also have a look at the patch.

And again, thank you for fixing this issue!

The new status of this patch is: Ready for Committer

Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
David Steele
Date:
On 3/15/17 2:28 AM, Michael Paquier wrote:
> On Wed, Mar 15, 2017 at 12:27 AM, Anastasia Lubennikova
> <lubennikovaav@gmail.com> wrote:
>> As far as I understand, in this thread were discussed two bugs of pg_stop_backup().
>> Thanks to the clear descriptions above, I easily reproduced both of them.
>>
>> BUG#1:
>> Server crashes on assertion on call of pg_stop_backup(false) after interrupted call of pg_stop_backup(false).
>> TRAP: FailedAssertion("!(XLogCtl->Insert.nonExclusiveBackups > 0)", File: "xlog.c", Line: 10747)
>>
>> BUG#2:
>> Failure to start an exclusive backup with the same name, if previous exclusive backup was stopped in another
session.
>>
>> Both problems seem to be fixed with patch "backup-session-locks-fixes.patch".
>> Speaking of the patch itself, I have a question: shouldn't we also update sessionBackupState
>> in pg_stop_backup_callback() along with XLogCtl->Insert.exclusiveBackupState?
> 
> No, that's not necessary. sessionBackupState is not changed until the
> code path where pg_stop_backup_callback() is active, and remains
> unchanged until it gets deactivated.
> 
>> And couple of minor notes:
>> 1) + * Routines to starting stop, and get status of a base backup
>> Probably should be: + * Routines to start, stop and get status of a base backup
>> And also this comment should be moved below the enum.
>>
>> 2) This is used in parallel of the shared memory status
>> s/ in parallel of/ in parallel with
> 
> Agreed on both points.

I have tested this patch and it behaves as expected and fixes the
original issue I reported.  One nit, I think:

+     * SESSION_BACKUP_EXCLUSIVE in this one. Actual verification that an

Would be better phrased as:

+     * SESSION_BACKUP_EXCLUSIVE in this function. Actual verification that an

Thanks,
-- 
-David
david@pgmasters.net



Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Andres Freund
Date:
Hi,

On 2017-03-15 15:28:03 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 64335f909e..eaf8e32fe1 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> --- a/src/backend/access/transam/xlogfuncs.c
> +++ b/src/backend/access/transam/xlogfuncs.c
> diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
> index 104ee7dd5e..d4abf94862 100644
> --- a/src/include/access/xlog.h
> +++ b/src/include/access/xlog.h
> @@ -288,8 +288,26 @@ extern void assign_max_wal_size(int newval, void *extra);
>  extern void assign_checkpoint_completion_target(double newval, void *extra);

This seems like something easy enough to exercise in a tap test, could
we get one?

Greetings,

Andres Freund



Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Michael Paquier
Date:
On Thu, Mar 16, 2017 at 12:46 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-03-15 15:28:03 +0900, Michael Paquier wrote:
>> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
>> index 64335f909e..eaf8e32fe1 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> --- a/src/backend/access/transam/xlogfuncs.c
>> +++ b/src/backend/access/transam/xlogfuncs.c
>> diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
>> index 104ee7dd5e..d4abf94862 100644
>> --- a/src/include/access/xlog.h
>> +++ b/src/include/access/xlog.h
>> @@ -288,8 +288,26 @@ extern void assign_max_wal_size(int newval, void *extra);
>>  extern void assign_checkpoint_completion_target(double newval, void *extra);
>
> This seems like something easy enough to exercise in a tap test, could
> we get one?

The first problem needs to have a cancel request sent when
pg_stop_backup() is running, the second needs to have a session held
to see an the inconsistent status of the session lock, which are two
concepts foreign to the TAP tests without being able to run the
queries asynchronously and keep the sessions alive.
-- 
Michael



Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Teodor Sigaev
Date:
Hi!

I believe patch looks good and it's ready to commit. As I understand, it fixes 
bug introduced by
commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200
    Implement backup API functions for non-exclusive backups

And, suppose, it should be backpatched to 9.6?

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Michael Paquier
Date:
On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
> I believe patch looks good and it's ready to commit.

Thanks for the review!

> As I understand, it fixes bug introduced by
> commit 7117685461af50f50c03f43e6a622284c8d54694
> Date:   Tue Apr 5 20:03:49 2016 +0200
>
>     Implement backup API functions for non-exclusive backups

Indeed.

> And, suppose, it should be backpatched to 9.6?

Yes, that's where non-exclusive backups have been introduced.
-- 
Michael



Re: [HACKERS] Backend crash on non-exclusive backup cancel

From
Teodor Sigaev
Date:
Thank you all, pushed

Michael Paquier wrote:
> On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>> I believe patch looks good and it's ready to commit.
>
> Thanks for the review!
>
>> As I understand, it fixes bug introduced by
>> commit 7117685461af50f50c03f43e6a622284c8d54694
>> Date:   Tue Apr 5 20:03:49 2016 +0200
>>
>>     Implement backup API functions for non-exclusive backups
>
> Indeed.
>
>> And, suppose, it should be backpatched to 9.6?
>
> Yes, that's where non-exclusive backups have been introduced.
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Backend crash on non-exclusive backup cancel

From
Michael Paquier
Date:
On Fri, Mar 24, 2017 at 7:58 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
> Thank you all, pushed.

Thanks.
-- 
Michael