Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
Date
Msg-id CAD21AoBo=zaiWT02xp5+EcdDUuok0ONEm4g-rG7DG1oPw8_nmA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Wed, Nov 15, 2017 at 10:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 9:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Nov 15, 2017 2:59 AM, "Fujii Masao" <masao.fujii@gmail.com> wrote:
>>> + /* Quick exit if we have done the backup */
>>> + if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
>>> + return;
>>>
>>> This quick exit seems to cause another problem. Please imagine the
>>> case where there is no exclusive backup running, a backend starts
>>> non-exclusive backup and is terminated before calling pg_stop_backup().
>>> In this case, do_pg_abort_backup() should decrement
>>> XLogCtl->Insert.nonExclusiveBackups, but, with the patch, because of
>>> the above quick exit, do_pg_abort_backup() fails to decrement that.
>>
>> Hmm, I think that in this case because  exclusiveBackupState is not
>> EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups is surely decremented. Am I
>> missing something?
>
> Nah. Fujii-san is right here as exclusiveBackupState is never updated
> for non-exclusive backups.

Thank you, I was wrong.

> You need an extra check on
> sessionBackupState to make sure that even for non-exclusive backups
> the counter is correctly decremented if a non-exclusive session lock
> is hold. For an exclusive backup, the session lock can be either
> SESSION_BACKUP_EXCLUSIVE if an exclusive backup is stopped on the same
> session as the start phase, or SESSION_BACKUP_NONE if the exclusive
> backup is stopped from a different session. So you'd basically need
> that:
> +   /*
> +    * Quick exit if we have done the exclusive backup or if session is
> +    * not keeping around a started non-exclusive backup.
> +    */
> +   if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE &&
> +       sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE)
> +       return;
>
> At the same time it would be safer at startup phase to update
> sessionBackupState when holding the WAL insert lock to prevent other
> CHECK_FOR_INTERRUPTS hazards. Suggestion of patch attached.

I think we need to check only sessionBackupState and don't need to
check XLogCtl->Insert.exclusiveBackupState in do_pg_abort_backup(). We
can quickly return if sessionBackupState !=
SESSION_BACKUP_NON_EXCLUSIVE. In your suggestion, I think we can still
get an assertion failure when pg_stop_backup(false) waiting for
archiving is terminated while concurrent an exclusive backup is in
progress.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Steve Singer
Date:
Subject: Re: [HACKERS] pgbench regression test failure
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.