Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Date
Msg-id CAB7nPqSq0YEnUfB1wcXdv75ZEKXuYf+4JpSozR2TVMUfQjOakQ@mail.gmail.com
Whole thread Raw
In response to Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Mon, Aug 29, 2016 at 11:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> You seem to add another entry for this patch into CommitFest.
> Either of them needs to be removed.
> https://commitfest.postgresql.org/10/698/

Indeed. I just removed this one.

> This patch prevents pg_stop_backup from starting while pg_start_backup
> is running. OTOH, we also should prevent pg_start_backup from starting
> until "previous" pg_stop_backup has completed? What happens if
> pg_start_backup starts while pg_stop_backup is running?
> As far as I read the current code, ISTM that there is no need to do that,
> but it's better to do the double check.

I don't agree about doing that. The on-disk presence of the
backup_label file is what prevents pg_start_backup to run should
pg_stop_backup have already decremented its counters but not unlinked
yet the backup_label, so this insurance is really enough IMO. One good
reason to keep pg_stop_backup as-is in its failure handling is that if
for example it fails to remove the backup_label file, it is still
possible to do again a backup by just removing manually the existing
backup_label file *without* restarting the server. If you have an
in-memory state it would not be possible to fallback to this method
and you'd need a method to clean up the in-memory state.

Now, well, with the patch as well as HEAD, it could be possible that
the backup counters are decremented, but pg_stop_backup *fails* to
remove the backup_label file which would prevent any subsequent
pg_start_backup to run, because there is still a backup_label file, as
well as any other pg_stop_backup to run, because there is no backup
running per the in-memory state. We could improve the in-memory state
by:
- Having an extra exclusive backup status to mark an exclusive backup
as stopping, say EXCLUSIVE_BACKUP_STOPPING. Then mark the exclusive
backup status as such when do_pg_stop_backup starts.
- delaying counter decrementation in pg_stop_backup to the moment when
the backup_label file has been removed.
- Taking an extra time the exclusive WAL insert lock after
backup_label is removed, and decrement the counters.
- During the time the backup_label file is removed, we need an error
callback to switch back to BACKUP_RUNNING in case of error, similarly
to do_pg_start_backup.
Which is more or less the attached patch. Now, if pg_stop_backup
fails, this has the disadvantage to force a server restart to clean up
the in-memory state, instead of just removing manually the existing
backup_file.

For those reasons I still strongly recommend using v3, and not meddle
with the error handling of pg_stop_backup. Because, well, that's
actually not necessary, and this would just hurt the error handling of
exclusive backups.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Comment on GatherPath.single_copy
Next
From: Fujii Masao
Date:
Subject: Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)