Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200) - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Date
Msg-id 20160805.173148.122310681.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
Hello,

While the exact cause of the situation is not known yet but we
have apparently forgot that pg_stop_backup can be called
simultaneously with pg_start_backup. It seems anyway to me that
pg_stop_backup should be called before the end of pg_start_backup
from their definition and what they do. And it should fix the
assertion failure.

On solution is exclusiveBackupStarted (I'd like to rename the
variable) on shared memory as the patch does.

Another place where we can have something with the same effect is
file system. We can create 'backup_label.tmp" at very early in
pg_start_backup and rename it to backup_label at the end of the
function. Anyway exclusive pg_stop_backup needs that. A defect of
that would be the remaining backup_label.tmp file after crash
during pg_start_backup. Renaming tmp to (none) is atomic enough
for this usage but I'm not sure if it's in a network file system.
exclusiveBackup is also used this kind of exclusion, this also is
replasable with the .tmp file.

But after some thoughts, it found to need to add a bunch of error
handling code for the file operations and it should become too
complex. So I abandoned it.


At Fri, 5 Aug 2016 12:16:13 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqQeMQ8K3Vwh+T7Gq0O3i-3kiRyNPue9WRgqJQVuFAbZrQ@mail.gmail.com>
> On Thu, Aug 4, 2016 at 4:38 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > Andreas, with the patch attached is the assertion still triggered?
> >
> > Thoughts from others are welcome as well.
> 
> Self review:
>       * of streaming base backups currently in progress. forcePageWrites is set
>       * to true when either of these is non-zero. lastBackupStart is the latest
>       * checkpoint redo location used as a starting point for an online backup.
> +     * exclusiveBackupStarted is true while pg_start_backup() is being called
> +     * during an exclusive backup.
>       */
>      bool        exclusiveBackup;
>      int            nonExclusiveBackups;
>      XLogRecPtr    lastBackupStart;
> +    bool        exclusiveBackupStarted;
> It would be better to just use one variable that uses an enum to track
> the following states of an exclusive backup: none, starting and
> in-progress.

What I thought when I looked the first patch is that. Addition to
that I don't see the name exclusiveBackupStated is
appropriate.

One more argument is the necessity of the WALInsertLock at the
end of pg_start_backup. No other backend cannot reach there
concurrently and possible latency from cache coherency won't be a
matter for the purpose, I suppose. What do you think it is needed
for?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Refactoring of heapam code.
Next
From: Etsuro Fujita
Date:
Subject: Re: Oddity in EXPLAIN for foreign/custom join pushdown plans