Thread: subscriptionCheck failures on nightjar

subscriptionCheck failures on nightjar

From

Tom Lane

Date:

11 February 2019, 06:31:23

nightjar just did this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2019-02-11%2004%3A33%3A07

The critical bit seems to be that the publisher side of the
010_truncate.pl test failed like so:

2019-02-10 23:55:58.765 EST [40771] sub3 LOG:  statement: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
2019-02-10 23:55:58.765 EST [40771] sub3 LOG:  received replication command: CREATE_REPLICATION_SLOT
"sub3_16414_sync_16394"TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 
2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such
fileor directory 
2019-02-10 23:55:58.800 EST [40771] sub3 LOG:  logical decoding found consistent point at 0/160B578
2019-02-10 23:55:58.800 EST [40771] sub3 DETAIL:  There are no running transactions.

I'm not sure what to make of that, but I notice that nightjar has
failed subscriptionCheck seven times since mid-December, and every one
of those shows this same PANIC.  Meanwhile, no other buildfarm member
has produced such a failure.  It smells like a race condition with
a rather tight window, but that's just a guess.

So: (1) what's causing the failure?  (2) could we respond with
something less than take-down-the-whole-database when a failure
happens in this area?

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Thomas Munro

Date:

13 February 2019, 11:55:59

On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such
fileor directory
 

<pokes at totally unfamiliar code>

They get atomically renamed into place, which seems kosher even if
snapshots for the same LSN are created concurrently by different
backends (and tracing syscalls confirms that that does occasionally
happen).  It's hard to believe that nightjar's rename() ceased to be
atomic a couple of months ago.  It looks like the only way for files
to get unlinked after that is by CheckPointSnapBuild() deciding they
are too old.

Hmm.  Could this be relevant, and cause a well timed checkpoint to
unlink files too soon?

2019-02-12 21:52:58.304 EST [22922] WARNING:  out of logical
replication worker slots

-- 
Thomas Munro
http://www.enterprisedb.com

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

13 February 2019, 16:57:32

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such
fileor directory 

> <pokes at totally unfamiliar code>

> They get atomically renamed into place, which seems kosher even if
> snapshots for the same LSN are created concurrently by different
> backends (and tracing syscalls confirms that that does occasionally
> happen).  It's hard to believe that nightjar's rename() ceased to be
> atomic a couple of months ago.  It looks like the only way for files
> to get unlinked after that is by CheckPointSnapBuild() deciding they
> are too old.

> Hmm.  Could this be relevant, and cause a well timed checkpoint to
> unlink files too soon?
> 2019-02-12 21:52:58.304 EST [22922] WARNING:  out of logical
> replication worker slots

I've managed to reproduce this locally, and obtained this PANIC:

log/010_truncate_publisher.log:2019-02-13 11:29:04.759 EST [9973] sub3 PANIC:  could not open file
"pg_logical/snapshots/0-16067B0.snap":No such file or directory 

with this stack trace:

#0  0x0000000801ab610c in kill () from /lib/libc.so.7
#1  0x0000000801ab4d3b in abort () from /lib/libc.so.7
#2  0x000000000089202e in errfinish (dummy=Variable "dummy" is not available.
) at elog.c:552
#3  0x000000000075d339 in fsync_fname_ext (
    fname=0x7fffffffba20 "pg_logical/snapshots/0-16067B0.snap", isdir=Variable "isdir" is not available.
)
    at fd.c:3372
#4  0x0000000000730c75 in SnapBuildSerialize (builder=0x80243c118,
    lsn=23095216) at snapbuild.c:1669
#5  0x0000000000731297 in SnapBuildProcessRunningXacts (builder=0x80243c118,
    lsn=23095216, running=0x8024424f0) at snapbuild.c:1110
#6  0x0000000000722eac in LogicalDecodingProcessRecord (ctx=0x802414118,
    record=0x8024143d8) at decode.c:318
#7  0x0000000000736f30 in XLogSendLogical () at walsender.c:2826
#8  0x00000000007389c7 in WalSndLoop (send_data=0x736ed0 <XLogSendLogical>)
    at walsender.c:2184
#9  0x0000000000738c3d in StartLogicalReplication (cmd=Variable "cmd" is not available.
) at walsender.c:1118
#10 0x0000000000739895 in exec_replication_command (
    cmd_string=0x80240e118 "START_REPLICATION SLOT \"sub3\" LOGICAL 0/0 (proto_version '1', publication_names
'\"pub3\"')")at walsender.c:1536 
#11 0x000000000078b272 in PostgresMain (argc=-14464, argv=Variable "argv" is not available.
) at postgres.c:4252
#12 0x00000000007113fa in PostmasterMain (argc=-14256, argv=0x7fffffffcc60)
    at postmaster.c:4382

So the problem seems to boil down to "somebody removed the snapshot
file between the time we renamed it into place and the time we tried
to fsync it".

I do find messages like the one you mention, but they are on the
subscriber not the publisher side, so I'm not sure if this matches
the scenario you have in mind?

log/010_truncate_subscriber.log:2019-02-13 11:29:02.343 EST [9970] WARNING:  out of logical replication worker slots
log/010_truncate_subscriber.log:2019-02-13 11:29:02.344 EST [9970] WARNING:  out of logical replication worker slots
log/010_truncate_subscriber.log:2019-02-13 11:29:03.401 EST [9970] WARNING:  out of logical replication worker slots

Anyway, I think we might be able to fix this along the lines of

    CloseTransientFile(fd);

+   /* ensure snapshot file is down to stable storage */
+   fsync_fname(tmppath, false);
    fsync_fname("pg_logical/snapshots", true);

    /*
     * We may overwrite the work from some other backend, but that's ok, our
     * snapshot is valid as well, we'll just have done some superfluous work.
     */
    if (rename(tmppath, path) != 0)
    {
        ereport(ERROR,
                (errcode_for_file_access(),
                 errmsg("could not rename file \"%s\" to \"%s\": %m",
                        tmppath, path)));
    }

-   /* make sure we persist */
-   fsync_fname(path, false);
+   /* ensure we persist the file rename */
    fsync_fname("pg_logical/snapshots", true);

The existing code here seems simply wacky/unsafe to me regardless
of this race condition: couldn't it potentially result in a corrupt
snapshot file appearing with a valid name, if the system crashes
after persisting the rename but before it's pushed the data out?

I also wonder why bother with the directory sync just before the
rename.

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

13 February 2019, 17:11:01

Hi,

On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:

Cool. How exactly?

Nice catch.

> Anyway, I think we might be able to fix this along the lines of
> 
>     CloseTransientFile(fd);
> 
> +   /* ensure snapshot file is down to stable storage */
> +   fsync_fname(tmppath, false);
>     fsync_fname("pg_logical/snapshots", true);
> 
>     /*
>      * We may overwrite the work from some other backend, but that's ok, our
>      * snapshot is valid as well, we'll just have done some superfluous work.
>      */
>     if (rename(tmppath, path) != 0)
>     {
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not rename file \"%s\" to \"%s\": %m",
>                         tmppath, path)));
>     }
> 
> -   /* make sure we persist */
> -   fsync_fname(path, false);
> +   /* ensure we persist the file rename */
>     fsync_fname("pg_logical/snapshots", true);

Hm, but that's not the same? On some filesystems one needs the directory
fsync, on some the file fsync, and I think both in some cases. ISTM we
should just open the file before the rename, and then use fsync() on the
filehandle rather than the filename.  Then a concurrent rename ought not
to hurt us?

> The existing code here seems simply wacky/unsafe to me regardless
> of this race condition: couldn't it potentially result in a corrupt
> snapshot file appearing with a valid name, if the system crashes
> after persisting the rename but before it's pushed the data out?

What do you mean precisely with "before it's pushed the data out"?

> I also wonder why bother with the directory sync just before the
> rename.

Because on some FS/OS combinations the size of the renamed-into-place
file isn't guaranteed to be durable unless the directory was
fsynced. Otherwise there's the possibility to have incomplete data if we
crash after renaming, but before fsyncing.

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

13 February 2019, 17:37:35

Andres Freund <andres@anarazel.de> writes:
> On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
>> I've managed to reproduce this locally, and obtained this PANIC:

> Cool. How exactly?

Andrew told me that nightjar is actually running in a qemu VM,
so I set up freebsd 9.0 in a qemu VM, and boom.  It took a bit
of fiddling with qemu parameters, but for such a timing-sensitive
problem, that's not surprising.

>> Anyway, I think we might be able to fix this along the lines of
>> [ fsync the data before renaming not after ]

> Hm, but that's not the same? On some filesystems one needs the directory
> fsync, on some the file fsync, and I think both in some cases.

Now that I look at it, there's a pg_fsync() just above this, so
I wonder why we need a second fsync on the file at all.  fsync'ing
the directory is needed to ensure the directory entry is on disk;
but the file data should be out already, or else the kernel is
simply failing to honor fsync.

>> The existing code here seems simply wacky/unsafe to me regardless
>> of this race condition: couldn't it potentially result in a corrupt
>> snapshot file appearing with a valid name, if the system crashes
>> after persisting the rename but before it's pushed the data out?

> What do you mean precisely with "before it's pushed the data out"?

Given the previous pg_fsync, this isn't an issue.

>> I also wonder why bother with the directory sync just before the
>> rename.

> Because on some FS/OS combinations the size of the renamed-into-place
> file isn't guaranteed to be durable unless the directory was
> fsynced.

Bleah.  But in any case, the rename should not create a situation
in which we need to fsync the file data again.

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

13 February 2019, 17:41:51

Hi,

On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> >> I've managed to reproduce this locally, and obtained this PANIC:
> 
> > Cool. How exactly?
> 
> Andrew told me that nightjar is actually running in a qemu VM,
> so I set up freebsd 9.0 in a qemu VM, and boom.  It took a bit
> of fiddling with qemu parameters, but for such a timing-sensitive
> problem, that's not surprising.

Ah.


> >> I also wonder why bother with the directory sync just before the
> >> rename.
> 
> > Because on some FS/OS combinations the size of the renamed-into-place
> > file isn't guaranteed to be durable unless the directory was
> > fsynced.
> 
> Bleah.  But in any case, the rename should not create a situation
> in which we need to fsync the file data again.

Well, it's not super well defined which of either you need to make the
rename durable, and it appears to differ between OSs. Any argument
against fixing it up like I suggested, by using an fd from before the
rename?

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

13 February 2019, 17:59:19

Andres Freund <andres@anarazel.de> writes:
> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>> Bleah.  But in any case, the rename should not create a situation
>> in which we need to fsync the file data again.

> Well, it's not super well defined which of either you need to make the
> rename durable, and it appears to differ between OSs. Any argument
> against fixing it up like I suggested, by using an fd from before the
> rename?

I'm unimpressed.  You're speculating about the filesystem having random
deviations from POSIX behavior, and using that weak argument to justify a
totally untested technique having its own obvious portability hazards.
Who's to say that an fsync on a file opened before a rename is going to do
anything good after the rename?  (On, eg, NFS there are obvious reasons
why it might not.)

Also, I wondered why this is coming out as a PANIC.  I thought originally
that somebody must be causing this code to run in a critical section,
but it looks like the real issue is just that fsync_fname() uses
data_sync_elevel, which is

int
data_sync_elevel(int elevel)
{
    return data_sync_retry ? elevel : PANIC;
}

I really really don't want us doing questionably-necessary fsyncs with a
PANIC as the result.  Perhaps more to the point, the way this was coded,
the PANIC applies to open() failures in fsync_fname_ext() not just fsync()
failures; that's painting with too broad a brush isn't it?

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

13 February 2019, 18:12:25

Hi,

On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
> >> Bleah.  But in any case, the rename should not create a situation
> >> in which we need to fsync the file data again.
> 
> > Well, it's not super well defined which of either you need to make the
> > rename durable, and it appears to differ between OSs. Any argument
> > against fixing it up like I suggested, by using an fd from before the
> > rename?
> 
> I'm unimpressed.  You're speculating about the filesystem having random
> deviations from POSIX behavior, and using that weak argument to justify a
> totally untested technique having its own obvious portability hazards.

Uhm, we've reproduced failures due to the lack of such fsyncs at some
point. And not some fringe OS, but ext4 (albeit with data=writeback).

I don't think POSIX has yet figured out what they actually think is
required:
http://austingroupbugs.net/view.php?id=672

I guess we could just ignore ENOENT in this case, that ought to be just
as safe as using the old fd.


> Also, I wondered why this is coming out as a PANIC.  I thought originally
> that somebody must be causing this code to run in a critical section,
> but it looks like the real issue is just that fsync_fname() uses
> data_sync_elevel, which is
> 
> int
> data_sync_elevel(int elevel)
> {
>     return data_sync_retry ? elevel : PANIC;
> }
> 
> I really really don't want us doing questionably-necessary fsyncs with a
> PANIC as the result.

Well, given the 'failed fsync throws dirty data away' issue, I don't
quite see what we can do otherwise. But:


> Perhaps more to the point, the way this was coded, the PANIC applies
> to open() failures in fsync_fname_ext() not just fsync() failures;
> that's painting with too broad a brush isn't it?

That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
this though - I guess we could rename fsync_fname_ext's eleval parameter
to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
to bad?

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

13 February 2019, 18:24:03

Andres Freund <andres@anarazel.de> writes:
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?

> That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
> this though - I guess we could rename fsync_fname_ext's eleval parameter
> to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> to bad?

Perhaps fsync_fname() should pass the elevel through as-is, and
then fsync_fname_ext() apply the data_sync_elevel() promotion only
to the fsync() call not the open()?  I'm not sure.

The bigger picture here is that there are probably some call sites where
PANIC on open() failure is appropriate too.  So having fsync_fname[_ext]
deciding what to do on its own is likely to be inadequate.

If we fix this by allowing ENOENT to not be an error in this particular
call case, that's going to involve an fsync_fname_ext API change anyway...

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

13 February 2019, 18:33:03

Hi,

On 2019-02-13 13:24:03 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
> >> Perhaps more to the point, the way this was coded, the PANIC applies
> >> to open() failures in fsync_fname_ext() not just fsync() failures;
> >> that's painting with too broad a brush isn't it?
> 
> > That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
> > this though - I guess we could rename fsync_fname_ext's eleval parameter
> > to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> > to bad?
> 
> Perhaps fsync_fname() should pass the elevel through as-is, and
> then fsync_fname_ext() apply the data_sync_elevel() promotion only
> to the fsync() call not the open()?  I'm not sure.

Yea, that's probably not a bad plan. It'd address your:

> The bigger picture here is that there are probably some call sites where
> PANIC on open() failure is appropriate too.  So having fsync_fname[_ext]
> deciding what to do on its own is likely to be inadequate.

Seems to me we ought to do this regardless of the bug discussed
here. But we'd nee dto be careful that we'd take the "more severe" error
between the passed in elevel and data_sync_elevel(). Otherwise we'd end
up downgrading errors...

> If we fix this by allowing ENOENT to not be an error in this particular
> call case, that's going to involve an fsync_fname_ext API change anyway...

I was kinda pondering just open coding it.  I am not yet convinced that
my idea of just using an open FD isn't the least bad approach for the
issue at hand.  What precisely is the NFS issue you're concerned about?

Right now fsync_fname_ext isn't exposed outside fd.c...

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

13 February 2019, 19:11:26

Andres Freund <andres@anarazel.de> writes:
> I was kinda pondering just open coding it.  I am not yet convinced that
> my idea of just using an open FD isn't the least bad approach for the
> issue at hand.  What precisely is the NFS issue you're concerned about?

I'm not sure that fsync-on-FD after the rename will work, considering that
the issue here is that somebody might've unlinked the file altogether
before we get to doing the fsync.  I don't have a hard time believing that
that might result in a failure report on NFS or similar.  Yeah, it's
hypothetical, but the argument that we need a repeat fsync at all seems
equally hypothetical.

> Right now fsync_fname_ext isn't exposed outside fd.c...

Mmm.  That makes it easier to consider changing its API.

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Thomas Munro

Date:

13 February 2019, 20:52:33

On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I was kinda pondering just open coding it.  I am not yet convinced that
> > my idea of just using an open FD isn't the least bad approach for the
> > issue at hand.  What precisely is the NFS issue you're concerned about?
>
> I'm not sure that fsync-on-FD after the rename will work, considering that
> the issue here is that somebody might've unlinked the file altogether
> before we get to doing the fsync.  I don't have a hard time believing that
> that might result in a failure report on NFS or similar.  Yeah, it's
> hypothetical, but the argument that we need a repeat fsync at all seems
> equally hypothetical.
>
> > Right now fsync_fname_ext isn't exposed outside fd.c...
>
> Mmm.  That makes it easier to consider changing its API.

Just to make sure I understand: it's OK for the file not to be there
when we try to fsync it by name, because a concurrent checkpoint can
remove it, having determined that we don't need it anymore?  In other
words, we really needed either missing_ok=true semantics, or to use
the fd we already had instead of the name?

I found 3 examples of this failing with an ERROR (though not turning
the BF red, so nobody noticed) before the PANIC patch went in:


https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-09-10%2020%3A54%3A21&stg=subscription-check
2018-09-10 17:20:09.247 EDT [23287] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory
2018-09-10 17:20:09.247 EDT [23285] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory


https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-08-31%2023%3A25%3A41&stg=subscription-check
2018-08-31 19:52:06.634 EDT [52724] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-31 19:52:06.634 EDT [52721] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory


https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-08-22%2021%3A49%3A18&stg=subscription-check
2018-08-22 18:10:29.422 EDT [44208] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-22 18:10:29.422 EDT [44206] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

-- 
Thomas Munro
http://www.enterprisedb.com

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

13 February 2019, 21:42:20

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> I found 3 examples of this failing with an ERROR (though not turning
> the BF red, so nobody noticed) before the PANIC patch went in:

Yeah, I suspected that had happened before with less-obvious consequences.
Now that we know where the problem is, you could probably make it highly
reproducible by inserting a sleep of a few msec between the rename and the
second fsync.

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

13 February 2019, 21:51:47

Hi,

On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
> On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I was kinda pondering just open coding it.  I am not yet convinced that
> > > my idea of just using an open FD isn't the least bad approach for the
> > > issue at hand.  What precisely is the NFS issue you're concerned about?
> >
> > I'm not sure that fsync-on-FD after the rename will work, considering that
> > the issue here is that somebody might've unlinked the file altogether
> > before we get to doing the fsync.  I don't have a hard time believing that
> > that might result in a failure report on NFS or similar.  Yeah, it's
> > hypothetical, but the argument that we need a repeat fsync at all seems
> > equally hypothetical.
> >
> > > Right now fsync_fname_ext isn't exposed outside fd.c...
> >
> > Mmm.  That makes it easier to consider changing its API.
> 
> Just to make sure I understand: it's OK for the file not to be there
> when we try to fsync it by name, because a concurrent checkpoint can
> remove it, having determined that we don't need it anymore?  In other
> words, we really needed either missing_ok=true semantics, or to use
> the fd we already had instead of the name?

I'm not yet sure that that's actually something that's supposed to
happen, I got to spend some time analysing how this actually
happens. Normally the contents of the slot should actually prevent it
from being removed (as they're newer than
ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
a bug in the drop logic in newer releases.

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Andrew Dunstan

Date:

10 March 2019, 18:40:15

On 2/13/19 1:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-02-13 12:59:19 -0500, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>>>> Bleah.  But in any case, the rename should not create a situation
>>>> in which we need to fsync the file data again.
>>> Well, it's not super well defined which of either you need to make the
>>> rename durable, and it appears to differ between OSs. Any argument
>>> against fixing it up like I suggested, by using an fd from before the
>>> rename?
>> I'm unimpressed.  You're speculating about the filesystem having random
>> deviations from POSIX behavior, and using that weak argument to justify a
>> totally untested technique having its own obvious portability hazards.
> Uhm, we've reproduced failures due to the lack of such fsyncs at some
> point. And not some fringe OS, but ext4 (albeit with data=writeback).
>
> I don't think POSIX has yet figured out what they actually think is
> required:
> http://austingroupbugs.net/view.php?id=672
>
> I guess we could just ignore ENOENT in this case, that ought to be just
> as safe as using the old fd.
>
>
>> Also, I wondered why this is coming out as a PANIC.  I thought originally
>> that somebody must be causing this code to run in a critical section,
>> but it looks like the real issue is just that fsync_fname() uses
>> data_sync_elevel, which is
>>
>> int
>> data_sync_elevel(int elevel)
>> {
>>     return data_sync_retry ? elevel : PANIC;
>> }
>>
>> I really really don't want us doing questionably-necessary fsyncs with a
>> PANIC as the result.
> Well, given the 'failed fsync throws dirty data away' issue, I don't
> quite see what we can do otherwise. But:
>
>
>> Perhaps more to the point, the way this was coded, the PANIC applies
>> to open() failures in fsync_fname_ext() not just fsync() failures;
>> that's painting with too broad a brush isn't it?
> That indeed seems wrong. Thomas?  I'm not quite sure how to best fix
> this though - I guess we could rename fsync_fname_ext's eleval parameter
> to fsync_failure_elevel? It's not visible outside fd.c, so that'd not be
> to bad?
>

Thread seems to have gone quiet ...


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

27 June 2019, 01:10:49

Andres Freund <andres@anarazel.de> writes:
> On 2019-02-14 09:52:33 +1300, Thomas Munro wrote:
>> Just to make sure I understand: it's OK for the file not to be there
>> when we try to fsync it by name, because a concurrent checkpoint can
>> remove it, having determined that we don't need it anymore?  In other
>> words, we really needed either missing_ok=true semantics, or to use
>> the fd we already had instead of the name?

> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're newer than
> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
> a bug in the drop logic in newer releases.

My animal dromedary just reproduced this failure, which we've previously
only seen on nightjar.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

28 June 2019, 15:17:45

I wrote:
> My animal dromedary just reproduced this failure, which we've previously
> only seen on nightjar.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-26%2023%3A57%3A45

Twice:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-06-28%2006%3A50%3A41

Since this is a live (if old) system, not some weird qemu emulation,
we can no longer pretend that it might not be reachable in the field.
I've added an open item.

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Michael Paquier

Date:

13 August 2019, 08:04:35

On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote:
> I'm not yet sure that that's actually something that's supposed to
> happen, I got to spend some time analysing how this actually
> happens. Normally the contents of the slot should actually prevent it
> from being removed (as they're newer than
> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
> a bug in the drop logic in newer releases.

In the same context, could it be a consequence of 9915de6c which has
introduced a conditional variable to control slot operations?  This
could have exposed more easily a pre-existing race condition.
--
Michael

Attachment

signature.asc

Re: subscriptionCheck failures on nightjar

From

Tomas Vondra

Date:

26 August 2019, 13:29:04

On Tue, Aug 13, 2019 at 05:04:35PM +0900, Michael Paquier wrote:
>On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote:
>> I'm not yet sure that that's actually something that's supposed to
>> happen, I got to spend some time analysing how this actually
>> happens. Normally the contents of the slot should actually prevent it
>> from being removed (as they're newer than
>> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's
>> a bug in the drop logic in newer releases.
>
>In the same context, could it be a consequence of 9915de6c which has
>introduced a conditional variable to control slot operations?  This
>could have exposed more easily a pre-existing race condition.
>--

This is one of the remaining open items, and we don't seem to be moving
forward with it :-(

I'm willing to take a stab at it, but to do that I need a way to
reproduce it. Tom, you mentioned you've managed to reproduce it in a
qemu instance, but that it took some fiddling with qemu parmeters or
something. Can you share what exactly was necessary?

An observation about the issue - while we started to notice this after
Decemeber, that's mostly because the PANIC patch went it shortly before.
We've however seen the issue before, as Thomas Munro mentioned in [1].

Those reports are from August, so it's quite possible something in the
first CF upset the code. And there's only a single commit in 2018-07
that seems related to logical decoding / snapshots [2], i.e. f49a80c:

commit f49a80c481f74fa81407dce8e51dea6956cb64f8
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Tue Jun 26 16:38:34 2018 -0400

    Fix "base" snapshot handling in logical decoding

    ...

The other reason to suspect this is related is that the fix also made it
to REL_11_STABLE at that time, and if you check the buildfarm data [3],
you'll see 11 fails on nightjar too, from time to time.

This means it's not a 12+ only issue, it's a live issue on 11. I don't
know if f49a80c is the culprit, or if it simply uncovered a pre-existing
bug (e.g. due to timing).

[1] https://www.postgresql.org/message-id/CAEepm%3D0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP%2BFfq-WT3g8VA%40mail.gmail.com

[2] https://commitfest.postgresql.org/18/1650/

[3] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=nightjar&br=REL_11_STABLE

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

26 August 2019, 15:01:20

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I'm willing to take a stab at it, but to do that I need a way to
> reproduce it. Tom, you mentioned you've managed to reproduce it in a
> qemu instance, but that it took some fiddling with qemu parmeters or
> something. Can you share what exactly was necessary?

I don't recall exactly what I did anymore, and it was pretty fiddly
anyway.  Upthread I suggested

>> Now that we know where the problem is, you could probably make it highly
>> reproducible by inserting a sleep of a few msec between the rename and the
>> second fsync.

so why not try that first?

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Tomas Vondra

Date:

26 August 2019, 20:23:25

On Mon, Aug 26, 2019 at 11:01:20AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I'm willing to take a stab at it, but to do that I need a way to
>> reproduce it. Tom, you mentioned you've managed to reproduce it in a
>> qemu instance, but that it took some fiddling with qemu parmeters or
>> something. Can you share what exactly was necessary?
>
>I don't recall exactly what I did anymore, and it was pretty fiddly
>anyway.  Upthread I suggested
>
>>> Now that we know where the problem is, you could probably make it highly
>>> reproducible by inserting a sleep of a few msec between the rename and the
>>> second fsync.
>
>so why not try that first?
>

Ah, right. I'll give that a try.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Robert Haas

Date:

17 September 2019, 15:54:45

On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> This is one of the remaining open items, and we don't seem to be moving
> forward with it :-(

Why exactly is this an open item, anyway?

I don't find any discussion on the thread which makes a clear argument
that this problem originated with v12. If it didn't, it's still a bug
and it still ought to be fixed at some point, but it's not a
release-blocker.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

17 September 2019, 16:39:33

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> This is one of the remaining open items, and we don't seem to be moving
>> forward with it :-(

> Why exactly is this an open item, anyway?

The reason it's still here is that Andres expressed a concern that
there might be more than meets the eye in this.  What meets the eye
is that PANICing on file-not-found is not appropriate here, but Andres
seemed to think that the file not being present might reflect an
actual bug not just an expectable race condition [1].

Personally I'd be happy just to treat it as an expectable case and
fix the code to not PANIC on file-not-found.

In either case, it probably belongs in the "older bugs" section;
nightjar is showing the same failure on v11 from time to time.

            regards, tom lane

[1] https://www.postgresql.org/message-id/20190213215147.cjbymfojf6xndr4t%40alap3.anarazel.de

Re: subscriptionCheck failures on nightjar

From

Tomas Vondra

Date:

17 September 2019, 19:45:10

On Tue, Sep 17, 2019 at 12:39:33PM -0400, Tom Lane wrote:
>Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Aug 26, 2019 at 9:29 AM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> This is one of the remaining open items, and we don't seem to be moving
>>> forward with it :-(
>
>> Why exactly is this an open item, anyway?
>
>The reason it's still here is that Andres expressed a concern that
>there might be more than meets the eye in this.  What meets the eye
>is that PANICing on file-not-found is not appropriate here, but Andres
>seemed to think that the file not being present might reflect an
>actual bug not just an expectable race condition [1].
>
>Personally I'd be happy just to treat it as an expectable case and
>fix the code to not PANIC on file-not-found.
>

FWIW I agree with Andres that there probably is an actual bug. The file
should not just disappear like this, it's clearly unexpected so the
PANIC does not seem entirely inappropriate.

I've tried reproducing the  issue on my local systems, with the extra
sleeps between fsyncs and so on, but I haven't managed to trigger it so
far :-(

>In either case, it probably belongs in the "older bugs" section;
>nightjar is showing the same failure on v11 from time to time.
>

Yes, it should be moved to the older section - it's clearly a v11 bug.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Michael Paquier

Date:

18 September 2019, 00:58:15

On Tue, Sep 17, 2019 at 09:45:10PM +0200, Tomas Vondra wrote:
> FWIW I agree with Andres that there probably is an actual bug. The file
> should not just disappear like this, it's clearly unexpected so the
> PANIC does not seem entirely inappropriate.

Agreed.

> I've tried reproducing the  issue on my local systems, with the extra
> sleeps between fsyncs and so on, but I haven't managed to trigger it so
> far :-(

On my side, I have let this thing run for a couple of hours with a
patched version to include a sleep between the rename and the sync but
I could not reproduce it either:
#!/bin/bash
attempt=0
while true; do
    attempt=$((attempt+1))
    echo "Attempt $attempt"
    cd $HOME/postgres/src/test/recovery/
    PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
    ERRNUM=$?
    if [ $ERRNUM != 0 ]; then
        echo "Failed at attempt $attempt"
        exit $ERRNUM
    fi
done
> Yes, it should be moved to the older section - it's clearly a v11 bug.

And agreed.
--
Michael

Attachment

signature.asc

Re: subscriptionCheck failures on nightjar

From

Kuntal Ghosh

Date:

18 September 2019, 10:55:14

Hello Michael,

On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On my side, I have let this thing run for a couple of hours with a
> patched version to include a sleep between the rename and the sync but
> I could not reproduce it either:
> #!/bin/bash
> attempt=0
> while true; do
>         attempt=$((attempt+1))
>         echo "Attempt $attempt"
>         cd $HOME/postgres/src/test/recovery/
>         PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
>         ERRNUM=$?
>         if [ $ERRNUM != 0 ]; then
>                 echo "Failed at attempt $attempt"
>                 exit $ERRNUM
>         fi
> done
I think the failing test is src/test/subscription/t/010_truncate.pl.
I've tried to reproduce the same failure using your script in OS X
10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
couldn't reproduce the same.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: subscriptionCheck failures on nightjar

From

Tomas Vondra

Date:

18 September 2019, 21:58:08

On Wed, Sep 18, 2019 at 04:25:14PM +0530, Kuntal Ghosh wrote:
>Hello Michael,
>
>On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On my side, I have let this thing run for a couple of hours with a
>> patched version to include a sleep between the rename and the sync but
>> I could not reproduce it either:
>> #!/bin/bash
>> attempt=0
>> while true; do
>>         attempt=$((attempt+1))
>>         echo "Attempt $attempt"
>>         cd $HOME/postgres/src/test/recovery/
>>         PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
>>         ERRNUM=$?
>>         if [ $ERRNUM != 0 ]; then
>>                 echo "Failed at attempt $attempt"
>>                 exit $ERRNUM
>>         fi
>> done
>I think the failing test is src/test/subscription/t/010_truncate.pl.
>I've tried to reproduce the same failure using your script in OS X
>10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
>couldn't reproduce the same.
>

I kinda suspect it might be just a coincidence that it fails during that
particular test. What likely plays a role here is a checkpoint timing
(AFAICS that's the thing removing the file).  On most systems the tests
complete before any checkpoint is triggered, hence no issue.

Maybe aggressively triggering checkpoints on the running cluter from
another session would do the trick ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Michael Paquier

Date:

19 September 2019, 04:23:05

On Wed, Sep 18, 2019 at 11:58:08PM +0200, Tomas Vondra wrote:
> I kinda suspect it might be just a coincidence that it fails during that
> particular test. What likely plays a role here is a checkpoint timing
> (AFAICS that's the thing removing the file).  On most systems the tests
> complete before any checkpoint is triggered, hence no issue.
>
> Maybe aggressively triggering checkpoints on the running cluter from
> another session would do the trick ...

Now that I recall, another thing I forgot to mention on this thread is
that I patched guc.c to reduce the minimum of checkpoint_timeout to
1s.
--
Michael

Attachment

signature.asc

Re: subscriptionCheck failures on nightjar

From

Kuntal Ghosh

Date:

19 September 2019, 11:50:15

Hello hackers,

It seems there is a pattern how the error is occurring in different
systems. Following are the relevant log snippets:

nightjar:
sub3 LOG:  received replication command: CREATE_REPLICATION_SLOT
"sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
sub3 LOG:  logical decoding found consistent point at 0/160B578
sub1 PANIC:  could not open file
"pg_logical/snapshots/0-160B578.snap": No such file or directory

dromedary scenario 1:
sub3_16414_sync_16399 LOG:  received replication command:
CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
sub2 PANIC:  could not open file
"pg_logical/snapshots/0-15EA694.snap": No such file or directory


dromedary scenario 2:
sub3_16414_sync_16399 LOG:  received replication command:
CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
sub1 PANIC:  could not open file
"pg_logical/snapshots/0-15EA694.snap": No such file or directory

While subscription 3 is created, it eventually reaches to a consistent
snapshot point and prints the WAL location corresponding to it. It
seems sub1/sub2 immediately fails to serialize the snapshot to the
.snap file having the same WAL location.

Is this helpful?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: subscriptionCheck failures on nightjar

From

Michael Paquier

Date:

20 September 2019, 01:01:55

On Thu, Sep 19, 2019 at 05:20:15PM +0530, Kuntal Ghosh wrote:
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.
>
> Is this helpful?

It looks like you are pointing to something here.
--
Michael

Attachment

signature.asc

Re: subscriptionCheck failures on nightjar

From

Tomas Vondra

Date:

20 September 2019, 15:30:48

On Thu, Sep 19, 2019 at 01:23:05PM +0900, Michael Paquier wrote:
>On Wed, Sep 18, 2019 at 11:58:08PM +0200, Tomas Vondra wrote:
>> I kinda suspect it might be just a coincidence that it fails during that
>> particular test. What likely plays a role here is a checkpoint timing
>> (AFAICS that's the thing removing the file).  On most systems the tests
>> complete before any checkpoint is triggered, hence no issue.
>>
>> Maybe aggressively triggering checkpoints on the running cluter from
>> another session would do the trick ...
>
>Now that I recall, another thing I forgot to mention on this thread is
>that I patched guc.c to reduce the minimum of checkpoint_timeout to
>1s.

But even with that change you haven't managed to reproduce the issue,
right? Or am I misunderstanding?

regarss

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

20 September 2019, 17:08:31

Hi,

On 2019-09-19 17:20:15 +0530, Kuntal Ghosh wrote:
> It seems there is a pattern how the error is occurring in different
> systems. Following are the relevant log snippets:
> 
> nightjar:
> sub3 LOG:  received replication command: CREATE_REPLICATION_SLOT
> "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
> sub3 LOG:  logical decoding found consistent point at 0/160B578
> sub1 PANIC:  could not open file
> "pg_logical/snapshots/0-160B578.snap": No such file or directory
> 
> dromedary scenario 1:
> sub3_16414_sync_16399 LOG:  received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
> sub2 PANIC:  could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
> 
> 
> dromedary scenario 2:
> sub3_16414_sync_16399 LOG:  received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
> sub1 PANIC:  could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
> 
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.

Since now a number of people (I tried as well), failed to reproduce this
locally, I propose that we increase the log-level during this test on
master. And perhaps expand the set of debugging information. With the
hope that the additional information on the cases encountered on the bf
helps us build a reproducer or, even better, diagnose the issue
directly.  If people agree, I'll come up with a patch.

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

20 September 2019, 20:25:21

Andres Freund <andres@anarazel.de> writes:
> Since now a number of people (I tried as well), failed to reproduce this
> locally, I propose that we increase the log-level during this test on
> master. And perhaps expand the set of debugging information. With the
> hope that the additional information on the cases encountered on the bf
> helps us build a reproducer or, even better, diagnose the issue
> directly.  If people agree, I'll come up with a patch.

I recreated my freebsd-9-under-qemu setup and I can still reproduce
the problem, though not with high reliability (order of 1 time in 10).
Anything particular you want logged?

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

20 September 2019, 21:26:03

Hi,

On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Since now a number of people (I tried as well), failed to reproduce this
> > locally, I propose that we increase the log-level during this test on
> > master. And perhaps expand the set of debugging information. With the
> > hope that the additional information on the cases encountered on the bf
> > helps us build a reproducer or, even better, diagnose the issue
> > directly.  If people agree, I'll come up with a patch.
> 
> I recreated my freebsd-9-under-qemu setup and I can still reproduce
> the problem, though not with high reliability (order of 1 time in 10).
> Anything particular you want logged?

A DEBUG2 log would help a fair bit, because it'd log some information
about what changes the "horizons" determining when data may be removed.

Perhaps with the additional elogs attached? I lowered some messages to
DEBUG2 so we don't have to suffer the noise of the ipc.c DEBUG3
messages.

If I use a TEMP_CONFIG setting log_min_messages=DEBUG2 with the patches
applied, the subscription tests still pass.

I hope they still fail on your setup, even though the increased logging
volume probably changes timing somewhat.

Greetings,

Andres Freund

Attachment

verbose.diff

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

20 September 2019, 21:49:27

Andres Freund <andres@anarazel.de> writes:
> On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
>> I recreated my freebsd-9-under-qemu setup and I can still reproduce
>> the problem, though not with high reliability (order of 1 time in 10).
>> Anything particular you want logged?

> A DEBUG2 log would help a fair bit, because it'd log some information
> about what changes the "horizons" determining when data may be removed.

Actually, what I did was as attached [1], and I am getting traces like
[2].  The problem seems to occur only when there are two or three
processes concurrently creating the same snapshot file.  It's not
obvious from the debug trace, but the snapshot file *does* exist
after the music stops.

It is very hard to look at this trace and conclude anything other
than "rename(2) is broken, it's not atomic".  Nothing in our code
has deleted the file: no checkpoint has started, nor do we see
the DEBUG1 output that CheckPointSnapBuild ought to produce.
But fsync_fname momentarily can't see it (and then later another
process does see it).

It is now apparent why we're only seeing this on specific ancient
platforms.  I looked around for info about rename(2) not being
atomic, and I found this info about FreeBSD:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=94849

The reported symptom there isn't quite the same, so probably there
is another issue, but there is plenty of reason to be suspicious
that UFS rename(2) is buggy in this release.  As for dromedary's
ancient version of macOS, Apple is exceedinly untransparent about
their bugs, but I found

http://www.weirdnet.nl/apple/rename.html

In short, what we got here is OS bugs that have probably been
resolved years ago.

The question is what to do next.  Should we just retire these
specific buildfarm critters, or do we want to push ahead with
getting rid of the PANIC here?

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

20 September 2019, 21:51:06

Sigh, forgot about attaching the attachments ...

            regards, tom lane

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0bd1d0f..53fd33c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1670,11 +1670,14 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
                  errmsg("could not rename file \"%s\" to \"%s\": %m",
                         tmppath, path)));
     }
+    elog(DEBUG1, "renamed serialized snapshot %s to %s", tmppath, path);

     /* make sure we persist */
     fsync_fname(path, false);
     fsync_fname("pg_logical/snapshots", true);

+    elog(DEBUG1, "fsynced %s", path);
+
     /*
      * Now there's no way we can loose the dumped state anymore, remember this
      * as a serialization point.
diff --git a/src/test/subscription/t/010_truncate.pl b/src/test/subscription/t/010_truncate.pl
index be2c0bd..2986582 100644
--- a/src/test/subscription/t/010_truncate.pl
+++ b/src/test/subscription/t/010_truncate.pl
@@ -9,6 +9,11 @@ use Test::More tests => 9;

 my $node_publisher = get_new_node('publisher');
 $node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', q{
+log_checkpoints = on
+log_min_messages = 'debug1'
+log_error_verbosity = verbose
+});
 $node_publisher->start;

 my $node_subscriber = get_new_node('subscriber');
2019-09-20 17:08:54.361 EDT [34418] DEBUG:  00000: registering background worker "logical replication launcher"
2019-09-20 17:08:54.361 EDT [34418] LOCATION:  RegisterBackgroundWorker, bgworker.c:855
2019-09-20 17:08:54.362 EDT [34418] LOG:  00000: starting PostgreSQL 13devel on x86_64-unknown-freebsd9.0, compiled by
gcc(GCC) 4.2.1 20070831 patched [FreeBSD], 64-bit 
2019-09-20 17:08:54.362 EDT [34418] LOCATION:  PostmasterMain, postmaster.c:1104
2019-09-20 17:08:54.362 EDT [34418] LOG:  00000: listening on Unix socket "/tmp/2lehMLoBNn/.s.PGSQL.65366"
2019-09-20 17:08:54.362 EDT [34418] LOCATION:  StreamServerPort, pqcomm.c:587
2019-09-20 17:08:54.363 EDT [34419] LOG:  00000: database system was shut down at 2019-09-20 17:08:54 EDT
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6241
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: checkpoint record is at 0/15D07F0
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6531
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: redo record is at 0/15D07F0; shutdown true
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6609
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: next transaction ID: 490; next OID: 12674
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6613
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: next MultiXactId: 1; next MultiXactOffset: 0
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6616
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: oldest unfrozen transaction ID: 483, in database 1
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6619
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: oldest MultiXactId: 1, in database 1
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6622
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: commit timestamp Xid oldest/newest: 0/0
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupXLOG, xlog.c:6626
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: transaction ID wrap limit is 2147484130, limited by database with
OID1 
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  SetTransactionIdLimit, varsup.c:410
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  SetMultiXactIdLimit, multixact.c:2271
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: starting up replication slots
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  StartupReplicationSlots, slot.c:1114
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  SetMultiXactIdLimit, multixact.c:2271
2019-09-20 17:08:54.363 EDT [34419] DEBUG:  00000: MultiXact member stop limit is now 4294914944 based on MultiXact 1
2019-09-20 17:08:54.363 EDT [34419] LOCATION:  SetOffsetVacuumLimit, multixact.c:2634
2019-09-20 17:08:54.364 EDT [34418] DEBUG:  00000: starting background worker process "logical replication launcher"
2019-09-20 17:08:54.364 EDT [34418] LOCATION:  do_start_bgworker, postmaster.c:5749
2019-09-20 17:08:54.364 EDT [34418] LOG:  00000: database system is ready to accept connections
2019-09-20 17:08:54.364 EDT [34418] LOCATION:  reaper, postmaster.c:3017
2019-09-20 17:08:54.365 EDT [34423] DEBUG:  00000: autovacuum launcher started
2019-09-20 17:08:54.365 EDT [34423] LOCATION:  AutoVacLauncherMain, autovacuum.c:441
2019-09-20 17:08:54.365 EDT [34425] DEBUG:  00000: logical replication launcher started
2019-09-20 17:08:54.365 EDT [34425] LOCATION:  ApplyLauncherMain, launcher.c:977
2019-09-20 17:08:55.294 EDT [34444] 010_truncate.pl LOG:  00000: statement: CREATE TABLE tab1 (a int PRIMARY KEY)
2019-09-20 17:08:55.294 EDT [34444] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.297 EDT [34444] 010_truncate.pl DEBUG:  00000: CREATE TABLE / PRIMARY KEY will create implicit
index"tab1_pkey" for table "tab1" 
2019-09-20 17:08:55.297 EDT [34444] 010_truncate.pl LOCATION:  DefineIndex, indexcmds.c:938
2019-09-20 17:08:55.298 EDT [34444] 010_truncate.pl DEBUG:  00000: building index "tab1_pkey" on table "tab1" serially
2019-09-20 17:08:55.298 EDT [34444] 010_truncate.pl LOCATION:  index_build, index.c:2724
2019-09-20 17:08:55.313 EDT [34448] 010_truncate.pl LOG:  00000: statement: CREATE TABLE tab2 (a int PRIMARY KEY)
2019-09-20 17:08:55.313 EDT [34448] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.315 EDT [34448] 010_truncate.pl DEBUG:  00000: CREATE TABLE / PRIMARY KEY will create implicit
index"tab2_pkey" for table "tab2" 
2019-09-20 17:08:55.315 EDT [34448] 010_truncate.pl LOCATION:  DefineIndex, indexcmds.c:938
2019-09-20 17:08:55.315 EDT [34448] 010_truncate.pl DEBUG:  00000: building index "tab2_pkey" on table "tab2" serially
2019-09-20 17:08:55.315 EDT [34448] 010_truncate.pl LOCATION:  index_build, index.c:2724
2019-09-20 17:08:55.327 EDT [34452] 010_truncate.pl LOG:  00000: statement: CREATE TABLE tab3 (a int PRIMARY KEY)
2019-09-20 17:08:55.327 EDT [34452] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.329 EDT [34452] 010_truncate.pl DEBUG:  00000: CREATE TABLE / PRIMARY KEY will create implicit
index"tab3_pkey" for table "tab3" 
2019-09-20 17:08:55.329 EDT [34452] 010_truncate.pl LOCATION:  DefineIndex, indexcmds.c:938
2019-09-20 17:08:55.329 EDT [34452] 010_truncate.pl DEBUG:  00000: building index "tab3_pkey" on table "tab3" serially
2019-09-20 17:08:55.329 EDT [34452] 010_truncate.pl LOCATION:  index_build, index.c:2724
2019-09-20 17:08:55.341 EDT [34456] 010_truncate.pl LOG:  00000: statement: CREATE TABLE tab4 (x int PRIMARY KEY, y int
REFERENCEStab3) 
2019-09-20 17:08:55.341 EDT [34456] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.343 EDT [34456] 010_truncate.pl DEBUG:  00000: CREATE TABLE / PRIMARY KEY will create implicit
index"tab4_pkey" for table "tab4" 
2019-09-20 17:08:55.343 EDT [34456] 010_truncate.pl LOCATION:  DefineIndex, indexcmds.c:938
2019-09-20 17:08:55.343 EDT [34456] 010_truncate.pl DEBUG:  00000: building index "tab4_pkey" on table "tab4" serially
2019-09-20 17:08:55.343 EDT [34456] 010_truncate.pl LOCATION:  index_build, index.c:2724
2019-09-20 17:08:55.369 EDT [34464] 010_truncate.pl LOG:  00000: statement: CREATE PUBLICATION pub1 FOR TABLE tab1
2019-09-20 17:08:55.369 EDT [34464] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.375 EDT [34466] 010_truncate.pl LOG:  00000: statement: CREATE PUBLICATION pub2 FOR TABLE tab2 WITH
(publish= insert) 
2019-09-20 17:08:55.375 EDT [34466] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.380 EDT [34468] 010_truncate.pl LOG:  00000: statement: CREATE PUBLICATION pub3 FOR TABLE tab3,
tab4
2019-09-20 17:08:55.380 EDT [34468] 010_truncate.pl LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.388 EDT [34471] sub1 LOG:  00000: statement: SELECT DISTINCT t.schemaname, t.tablename
      FROM pg_catalog.pg_publication_tables t
     WHERE t.pubname IN ('pub1')
2019-09-20 17:08:55.388 EDT [34471] sub1 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.392 EDT [34471] sub1 LOG:  00000: received replication command: CREATE_REPLICATION_SLOT "sub1"
LOGICALpgoutput NOEXPORT_SNAPSHOT 
2019-09-20 17:08:55.392 EDT [34471] sub1 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.392 EDT [34471] sub1 DEBUG:  00000: searching for logical decoding starting point, starting at
0/15FA830
2019-09-20 17:08:55.392 EDT [34471] sub1 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:55.392 EDT [34471] sub1 LOG:  00000: logical decoding found consistent point at 0/15FA830
2019-09-20 17:08:55.392 EDT [34471] sub1 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.392 EDT [34471] sub1 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.397 EDT [34474] sub1 LOG:  00000: received replication command: IDENTIFY_SYSTEM
2019-09-20 17:08:55.397 EDT [34474] sub1 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.397 EDT [34474] sub1 LOG:  00000: received replication command: START_REPLICATION SLOT "sub1"
LOGICAL0/0 (proto_version '1', publication_names '"pub1"') 
2019-09-20 17:08:55.397 EDT [34474] sub1 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.397 EDT [34474] sub1 LOG:  00000: starting logical decoding for slot "sub1"
2019-09-20 17:08:55.397 EDT [34474] sub1 DETAIL:  Streaming transactions committing after 0/15FA868, reading WAL from
0/15FA830.
2019-09-20 17:08:55.397 EDT [34474] sub1 LOCATION:  CreateDecodingContext, logical.c:448
2019-09-20 17:08:55.397 EDT [34474] sub1 LOG:  00000: logical decoding found consistent point at 0/15FA830
2019-09-20 17:08:55.397 EDT [34474] sub1 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.397 EDT [34474] sub1 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.397 EDT [34474] sub1 DEBUG:  00000: "sub1" has now caught up with upstream server
2019-09-20 17:08:55.397 EDT [34474] sub1 LOCATION:  WalSndLoop, walsender.c:2224
2019-09-20 17:08:55.400 EDT [34477] sub2 LOG:  00000: statement: SELECT DISTINCT t.schemaname, t.tablename
      FROM pg_catalog.pg_publication_tables t
     WHERE t.pubname IN ('pub2')
2019-09-20 17:08:55.400 EDT [34477] sub2 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.401 EDT [34478] sub1_16412_sync_16384 LOG:  00000: statement: BEGIN READ ONLY ISOLATION LEVEL
REPEATABLEREAD 
2019-09-20 17:08:55.401 EDT [34478] sub1_16412_sync_16384 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.401 EDT [34478] sub1_16412_sync_16384 LOG:  00000: received replication command:
CREATE_REPLICATION_SLOT"sub1_16412_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 
2019-09-20 17:08:55.401 EDT [34478] sub1_16412_sync_16384 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.402 EDT [34474] sub1 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA868.snap
2019-09-20 17:08:55.402 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:55.402 EDT [34474] sub1 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA868.snap.34474.tmpto pg_logical/snapshots/0-15FA868.snap 
2019-09-20 17:08:55.402 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:55.402 EDT [34474] sub1 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA868.snap
2019-09-20 17:08:55.402 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:55.402 EDT [34474] sub1 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA868, after
0/15FA868,current candidate 0/15FA868, current after 0/15FA868, flushed up to 0/15FA868 
2019-09-20 17:08:55.402 EDT [34474] sub1 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:55.402 EDT [34474] sub1 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.402 EDT [34474] sub1 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 DEBUG:  00000: searching for logical decoding starting point,
startingat 0/15FA868 
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 LOG:  00000: logical decoding found consistent point at
0/15FA868
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 LOG:  00000: statement: SELECT c.oid, c.relreplident  FROM
pg_catalog.pg_classc  INNER JOIN pg_catalog.pg_namespace n        ON (c.relnamespace = n.oid) WHERE n.nspname =
'public'  AND c.relname = 'tab1'   AND c.relkind = 'r' 
2019-09-20 17:08:55.402 EDT [34478] sub1_16412_sync_16384 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.403 EDT [34478] sub1_16412_sync_16384 LOG:  00000: statement: SELECT a.attname,       a.atttypid,
   a.atttypmod,       a.attnum = ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_index i
ON(i.indexrelid = pg_get_replica_identity_index(16384)) WHERE a.attnum > 0::pg_catalog.int2   AND NOT a.attisdropped
ANDa.attgenerated = ''   AND a.attrelid = 16384 ORDER BY a.attnum 
2019-09-20 17:08:55.403 EDT [34478] sub1_16412_sync_16384 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.403 EDT [34477] sub2 LOG:  00000: received replication command: CREATE_REPLICATION_SLOT "sub2"
LOGICALpgoutput NOEXPORT_SNAPSHOT 
2019-09-20 17:08:55.403 EDT [34477] sub2 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.403 EDT [34474] sub1 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA8A0.snap
2019-09-20 17:08:55.403 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:55.403 EDT [34474] sub1 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA8A0.snap.34474.tmpto pg_logical/snapshots/0-15FA8A0.snap 
2019-09-20 17:08:55.403 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:55.404 EDT [34474] sub1 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA8A0.snap
2019-09-20 17:08:55.404 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:55.404 EDT [34474] sub1 DEBUG:  00000: got new restart lsn 0/15FA8A0 at 0/15FA8A0
2019-09-20 17:08:55.404 EDT [34474] sub1 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:979
2019-09-20 17:08:55.404 EDT [34477] sub2 DEBUG:  00000: searching for logical decoding starting point, starting at
0/15FA8A0
2019-09-20 17:08:55.404 EDT [34477] sub2 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:55.404 EDT [34477] sub2 LOG:  00000: logical decoding found consistent point at 0/15FA8A0
2019-09-20 17:08:55.404 EDT [34477] sub2 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.404 EDT [34477] sub2 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.405 EDT [34478] sub1_16412_sync_16384 LOG:  00000: statement: COPY public.tab1 TO STDOUT
2019-09-20 17:08:55.405 EDT [34478] sub1_16412_sync_16384 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.406 EDT [34478] sub1_16412_sync_16384 LOG:  00000: statement: COMMIT
2019-09-20 17:08:55.406 EDT [34478] sub1_16412_sync_16384 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.406 EDT [34474] sub1 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.406 EDT [34474] sub1 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:55.411 EDT [34481] sub3 LOG:  00000: statement: SELECT DISTINCT t.schemaname, t.tablename
      FROM pg_catalog.pg_publication_tables t
     WHERE t.pubname IN ('pub3')
2019-09-20 17:08:55.411 EDT [34481] sub3 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.414 EDT [34481] sub3 LOG:  00000: received replication command: CREATE_REPLICATION_SLOT "sub3"
LOGICALpgoutput NOEXPORT_SNAPSHOT 
2019-09-20 17:08:55.414 EDT [34481] sub3 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.414 EDT [34474] sub1 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA8D8.snap
2019-09-20 17:08:55.414 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:55.414 EDT [34474] sub1 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA8D8.snap.34474.tmpto pg_logical/snapshots/0-15FA8D8.snap 
2019-09-20 17:08:55.414 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:55.414 EDT [34474] sub1 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA8D8.snap
2019-09-20 17:08:55.414 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:55.414 EDT [34474] sub1 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA8D8, after
0/15FA8D8,current candidate 0/15FA8D8, current after 0/15FA8D8, flushed up to 0/15FA8D8 
2019-09-20 17:08:55.414 EDT [34474] sub1 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:55.414 EDT [34481] sub3 DEBUG:  00000: searching for logical decoding starting point, starting at
0/15FA8D8
2019-09-20 17:08:55.414 EDT [34481] sub3 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:55.414 EDT [34474] sub1 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.414 EDT [34474] sub1 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:55.414 EDT [34481] sub3 LOG:  00000: logical decoding found consistent point at 0/15FA8D8
2019-09-20 17:08:55.414 EDT [34481] sub3 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.414 EDT [34481] sub3 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.924 EDT [34493] sub2 LOG:  00000: received replication command: IDENTIFY_SYSTEM
2019-09-20 17:08:55.924 EDT [34493] sub2 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.925 EDT [34493] sub2 LOG:  00000: received replication command: START_REPLICATION SLOT "sub2"
LOGICAL0/0 (proto_version '1', publication_names '"pub2"') 
2019-09-20 17:08:55.925 EDT [34493] sub2 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.925 EDT [34493] sub2 LOG:  00000: starting logical decoding for slot "sub2"
2019-09-20 17:08:55.925 EDT [34493] sub2 DETAIL:  Streaming transactions committing after 0/15FA8D8, reading WAL from
0/15FA8A0.
2019-09-20 17:08:55.925 EDT [34493] sub2 LOCATION:  CreateDecodingContext, logical.c:448
2019-09-20 17:08:55.925 EDT [34493] sub2 LOG:  00000: logical decoding found consistent point at 0/15FA8A0
2019-09-20 17:08:55.925 EDT [34493] sub2 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.925 EDT [34493] sub2 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.925 EDT [34493] sub2 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA8D8, after
0/15FA8D8,current candidate 0/15FA8D8, current after 0/15FA8D8, flushed up to 0/15FA8D8 
2019-09-20 17:08:55.925 EDT [34493] sub2 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:55.925 EDT [34493] sub2 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.925 EDT [34493] sub2 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:55.925 EDT [34493] sub2 DEBUG:  00000: "sub2" has now caught up with upstream server
2019-09-20 17:08:55.925 EDT [34493] sub2 LOCATION:  WalSndLoop, walsender.c:2224
2019-09-20 17:08:55.929 EDT [34495] sub2_16413_sync_16389 LOG:  00000: statement: BEGIN READ ONLY ISOLATION LEVEL
REPEATABLEREAD 
2019-09-20 17:08:55.929 EDT [34495] sub2_16413_sync_16389 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOG:  00000: received replication command:
CREATE_REPLICATION_SLOT"sub2_16413_sync_16389" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.930 EDT [34474] sub1 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA910.snap
2019-09-20 17:08:55.930 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:55.930 EDT [34493] sub2 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA910.snap
2019-09-20 17:08:55.930 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:55.930 EDT [34474] sub1 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA910.snap.34474.tmpto pg_logical/snapshots/0-15FA910.snap 
2019-09-20 17:08:55.930 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:55.930 EDT [34493] sub2 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA910.snap.34493.tmpto pg_logical/snapshots/0-15FA910.snap 
2019-09-20 17:08:55.930 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:55.930 EDT [34474] sub1 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA910.snap
2019-09-20 17:08:55.930 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:55.930 EDT [34474] sub1 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA910, after
0/15FA910,current candidate 0/15FA910, current after 0/15FA910, flushed up to 0/15FA910 
2019-09-20 17:08:55.930 EDT [34474] sub1 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:55.930 EDT [34493] sub2 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA910.snap
2019-09-20 17:08:55.930 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:55.930 EDT [34493] sub2 DEBUG:  00000: got new restart lsn 0/15FA910 at 0/15FA910
2019-09-20 17:08:55.930 EDT [34493] sub2 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:979
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 DEBUG:  00000: searching for logical decoding starting point,
startingat 0/15FA910 
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:55.930 EDT [34474] sub1 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.930 EDT [34474] sub1 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOG:  00000: logical decoding found consistent point at
0/15FA910
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOG:  00000: statement: SELECT c.oid, c.relreplident  FROM
pg_catalog.pg_classc  INNER JOIN pg_catalog.pg_namespace n        ON (c.relnamespace = n.oid) WHERE n.nspname =
'public'  AND c.relname = 'tab2'   AND c.relkind = 'r' 
2019-09-20 17:08:55.930 EDT [34495] sub2_16413_sync_16389 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.931 EDT [34497] sub3 LOG:  00000: received replication command: IDENTIFY_SYSTEM
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.931 EDT [34497] sub3 LOG:  00000: received replication command: START_REPLICATION SLOT "sub3"
LOGICAL0/0 (proto_version '1', publication_names '"pub3"') 
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:55.931 EDT [34497] sub3 LOG:  00000: starting logical decoding for slot "sub3"
2019-09-20 17:08:55.931 EDT [34497] sub3 DETAIL:  Streaming transactions committing after 0/15FA910, reading WAL from
0/15FA8D8.
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  CreateDecodingContext, logical.c:448
2019-09-20 17:08:55.931 EDT [34497] sub3 LOG:  00000: logical decoding found consistent point at 0/15FA8D8
2019-09-20 17:08:55.931 EDT [34497] sub3 DETAIL:  There are no running transactions.
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:55.931 EDT [34497] sub3 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA910, after
0/15FA910,current candidate 0/15FA910, current after 0/15FA910, flushed up to 0/15FA910 
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:55.931 EDT [34497] sub3 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:55.931 EDT [34497] sub3 DEBUG:  00000: "sub3" has now caught up with upstream server
2019-09-20 17:08:55.931 EDT [34497] sub3 LOCATION:  WalSndLoop, walsender.c:2224
2019-09-20 17:08:55.932 EDT [34495] sub2_16413_sync_16389 LOG:  00000: statement: SELECT a.attname,       a.atttypid,
   a.atttypmod,       a.attnum = ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_index i
ON(i.indexrelid = pg_get_replica_identity_index(16389)) WHERE a.attnum > 0::pg_catalog.int2   AND NOT a.attisdropped
ANDa.attgenerated = ''   AND a.attrelid = 16389 ORDER BY a.attnum 
2019-09-20 17:08:55.932 EDT [34495] sub2_16413_sync_16389 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.933 EDT [34495] sub2_16413_sync_16389 LOG:  00000: statement: COPY public.tab2 TO STDOUT
2019-09-20 17:08:55.933 EDT [34495] sub2_16413_sync_16389 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.934 EDT [34495] sub2_16413_sync_16389 LOG:  00000: statement: COMMIT
2019-09-20 17:08:55.934 EDT [34495] sub2_16413_sync_16389 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:55.934 EDT [34493] sub2 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:55.934 EDT [34493] sub2 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:56.941 EDT [34517] sub3_16414_sync_16394 LOG:  00000: statement: BEGIN READ ONLY ISOLATION LEVEL
REPEATABLEREAD 
2019-09-20 17:08:56.941 EDT [34517] sub3_16414_sync_16394 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOG:  00000: received replication command:
CREATE_REPLICATION_SLOT"sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:56.942 EDT [34474] sub1 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA948.snap
2019-09-20 17:08:56.942 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:56.942 EDT [34493] sub2 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA948.snap
2019-09-20 17:08:56.942 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:56.942 EDT [34497] sub3 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA948.snap
2019-09-20 17:08:56.942 EDT [34497] sub3 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:56.942 EDT [34474] sub1 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA948.snap.34474.tmpto pg_logical/snapshots/0-15FA948.snap 
2019-09-20 17:08:56.942 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:56.942 EDT [34474] sub1 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA948.snap
2019-09-20 17:08:56.942 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:56.942 EDT [34474] sub1 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA948, after
0/15FA948,current candidate 0/15FA948, current after 0/15FA948, flushed up to 0/15FA948 
2019-09-20 17:08:56.942 EDT [34474] sub1 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:56.942 EDT [34493] sub2 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA948.snap.34493.tmpto pg_logical/snapshots/0-15FA948.snap 
2019-09-20 17:08:56.942 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 DEBUG:  00000: searching for logical decoding starting point,
startingat 0/15FA948 
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:56.942 EDT [34493] sub2 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA948.snap
2019-09-20 17:08:56.942 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:56.942 EDT [34497] sub3 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA948.snap.34497.tmpto pg_logical/snapshots/0-15FA948.snap 
2019-09-20 17:08:56.942 EDT [34497] sub3 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:56.942 EDT [34474] sub1 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:56.942 EDT [34474] sub1 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:56.942 EDT [34493] sub2 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA948, after
0/15FA948,current candidate 0/15FA948, current after 0/15FA948, flushed up to 0/15FA948 
2019-09-20 17:08:56.942 EDT [34493] sub2 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOG:  00000: logical decoding found consistent point at
0/15FA948
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 DETAIL:  There are no running transactions.
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:56.942 EDT [34497] sub3 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA948.snap
2019-09-20 17:08:56.942 EDT [34497] sub3 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:56.942 EDT [34497] sub3 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA948, after
0/15FA948,current candidate 0/15FA948, current after 0/15FA948, flushed up to 0/15FA948 
2019-09-20 17:08:56.942 EDT [34497] sub3 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:56.942 EDT [34493] sub2 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:56.942 EDT [34493] sub2 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:56.942 EDT [34497] sub3 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:56.942 EDT [34497] sub3 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOG:  00000: statement: SELECT c.oid, c.relreplident  FROM
pg_catalog.pg_classc  INNER JOIN pg_catalog.pg_namespace n        ON (c.relnamespace = n.oid) WHERE n.nspname =
'public'  AND c.relname = 'tab3'   AND c.relkind = 'r' 
2019-09-20 17:08:56.942 EDT [34517] sub3_16414_sync_16394 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:56.944 EDT [34517] sub3_16414_sync_16394 LOG:  00000: statement: SELECT a.attname,       a.atttypid,
   a.atttypmod,       a.attnum = ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_index i
ON(i.indexrelid = pg_get_replica_identity_index(16394)) WHERE a.attnum > 0::pg_catalog.int2   AND NOT a.attisdropped
ANDa.attgenerated = ''   AND a.attrelid = 16394 ORDER BY a.attnum 
2019-09-20 17:08:56.944 EDT [34517] sub3_16414_sync_16394 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:56.946 EDT [34517] sub3_16414_sync_16394 LOG:  00000: statement: COPY public.tab3 TO STDOUT
2019-09-20 17:08:56.946 EDT [34517] sub3_16414_sync_16394 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:56.946 EDT [34517] sub3_16414_sync_16394 LOG:  00000: statement: COMMIT
2019-09-20 17:08:56.946 EDT [34517] sub3_16414_sync_16394 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:57.957 EDT [34537] sub3_16414_sync_16399 LOG:  00000: statement: BEGIN READ ONLY ISOLATION LEVEL
REPEATABLEREAD 
2019-09-20 17:08:57.957 EDT [34537] sub3_16414_sync_16399 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:57.957 EDT [34537] sub3_16414_sync_16399 LOG:  00000: received replication command:
CREATE_REPLICATION_SLOT"sub3_16414_sync_16399" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 
2019-09-20 17:08:57.957 EDT [34537] sub3_16414_sync_16399 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:57.957 EDT [34493] sub2 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA980.snap
2019-09-20 17:08:57.957 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:57.957 EDT [34474] sub1 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA980.snap
2019-09-20 17:08:57.957 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:57.957 EDT [34497] sub3 DEBUG:  00000: serializing snapshot to pg_logical/snapshots/0-15FA980.snap
2019-09-20 17:08:57.957 EDT [34497] sub3 LOCATION:  SnapBuildSerialize, snapbuild.c:1554
2019-09-20 17:08:57.957 EDT [34493] sub2 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA980.snap.34493.tmpto pg_logical/snapshots/0-15FA980.snap 
2019-09-20 17:08:57.957 EDT [34493] sub2 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:57.957 EDT [34474] sub1 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA980.snap.34474.tmpto pg_logical/snapshots/0-15FA980.snap 
2019-09-20 17:08:57.957 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:57.957 EDT [34474] sub1 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA980.snap
2019-09-20 17:08:57.957 EDT [34474] sub1 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:57.957 EDT [34493] sub2 PANIC:  58P01: could not open file "pg_logical/snapshots/0-15FA980.snap": No
suchfile or directory 
2019-09-20 17:08:57.957 EDT [34493] sub2 LOCATION:  fsync_fname_ext, fd.c:3376
2019-09-20 17:08:57.957 EDT [34474] sub1 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA980, after
0/15FA980,current candidate 0/15FA980, current after 0/15FA980, flushed up to 0/15FA980 
2019-09-20 17:08:57.957 EDT [34474] sub1 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 DEBUG:  00000: searching for logical decoding starting point,
startingat 0/15FA980 
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 LOCATION:  DecodingContextFindStartpoint, logical.c:476
2019-09-20 17:08:57.958 EDT [34497] sub3 DEBUG:  00000: renamed serialized snapshot
pg_logical/snapshots/0-15FA980.snap.34497.tmpto pg_logical/snapshots/0-15FA980.snap 
2019-09-20 17:08:57.958 EDT [34497] sub3 LOCATION:  SnapBuildSerialize, snapbuild.c:1673
2019-09-20 17:08:57.958 EDT [34497] sub3 DEBUG:  00000: fsynced pg_logical/snapshots/0-15FA980.snap
2019-09-20 17:08:57.958 EDT [34497] sub3 LOCATION:  SnapBuildSerialize, snapbuild.c:1679
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 LOG:  00000: logical decoding found consistent point at
0/15FA980
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 DETAIL:  There are no running transactions.
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1272
2019-09-20 17:08:57.958 EDT [34497] sub3 DEBUG:  00000: failed to increase restart lsn: proposed 0/15FA980, after
0/15FA980,current candidate 0/15FA980, current after 0/15FA980, flushed up to 0/15FA980 
2019-09-20 17:08:57.958 EDT [34497] sub3 LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:992
2019-09-20 17:08:57.958 EDT [34474] sub1 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:57.958 EDT [34474] sub1 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:57.958 EDT [34497] sub3 DEBUG:  00000: updated xmin: 0 restart: 1
2019-09-20 17:08:57.958 EDT [34497] sub3 LOCATION:  LogicalConfirmReceivedLocation, logical.c:1061
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 LOG:  00000: statement: SELECT c.oid, c.relreplident  FROM
pg_catalog.pg_classc  INNER JOIN pg_catalog.pg_namespace n        ON (c.relnamespace = n.oid) WHERE n.nspname =
'public'  AND c.relname = 'tab4'   AND c.relkind = 'r' 
2019-09-20 17:08:57.958 EDT [34537] sub3_16414_sync_16399 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:57.962 EDT [34537] sub3_16414_sync_16399 LOG:  00000: statement: SELECT a.attname,       a.atttypid,
   a.atttypmod,       a.attnum = ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_index i
ON(i.indexrelid = pg_get_replica_identity_index(16399)) WHERE a.attnum > 0::pg_catalog.int2   AND NOT a.attisdropped
ANDa.attgenerated = ''   AND a.attrelid = 16399 ORDER BY a.attnum 
2019-09-20 17:08:57.962 EDT [34537] sub3_16414_sync_16399 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:57.966 EDT [34537] sub3_16414_sync_16399 LOG:  00000: statement: COPY public.tab4 TO STDOUT
2019-09-20 17:08:57.966 EDT [34537] sub3_16414_sync_16399 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:57.966 EDT [34540] sub2 LOG:  00000: received replication command: IDENTIFY_SYSTEM
2019-09-20 17:08:57.966 EDT [34540] sub2 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:57.966 EDT [34537] sub3_16414_sync_16399 LOG:  00000: statement: COMMIT
2019-09-20 17:08:57.966 EDT [34537] sub3_16414_sync_16399 LOCATION:  exec_simple_query, postgres.c:1045
2019-09-20 17:08:57.966 EDT [34540] sub2 LOG:  00000: received replication command: START_REPLICATION SLOT "sub2"
LOGICAL0/0 (proto_version '1', publication_names '"pub2"') 
2019-09-20 17:08:57.966 EDT [34540] sub2 LOCATION:  exec_replication_command, walsender.c:1487
2019-09-20 17:08:57.966 EDT [34540] sub2 ERROR:  55006: replication slot "sub2" is active for PID 34493
2019-09-20 17:08:57.966 EDT [34540] sub2 LOCATION:  ReplicationSlotAcquire, slot.c:399
2019-09-20 17:08:57.977 EDT [34418] LOG:  00000: server process (PID 34493) was terminated by signal 6: Abort trap: 6
2019-09-20 17:08:57.977 EDT [34418] LOCATION:  LogChildExit, postmaster.c:3708
2019-09-20 17:08:57.977 EDT [34418] LOG:  00000: terminating any other active server processes
2019-09-20 17:08:57.977 EDT [34418] LOCATION:  HandleChildCrash, postmaster.c:3428
2019-09-20 17:08:57.977 EDT [34474] sub1 WARNING:  57P02: terminating connection because of crash of another server
process
2019-09-20 17:08:57.977 EDT [34474] sub1 DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 
2019-09-20 17:08:57.977 EDT [34474] sub1 HINT:  In a moment you should be able to reconnect to the database and repeat
yourcommand. 
2019-09-20 17:08:57.977 EDT [34474] sub1 LOCATION:  quickdie, postgres.c:2737
2019-09-20 17:08:57.977 EDT [34497] sub3 WARNING:  57P02: terminating connection because of crash of another server
process
2019-09-20 17:08:57.977 EDT [34497] sub3 DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 
2019-09-20 17:08:57.977 EDT [34497] sub3 HINT:  In a moment you should be able to reconnect to the database and repeat
yourcommand. 
2019-09-20 17:08:57.977 EDT [34497] sub3 LOCATION:  quickdie, postgres.c:2737
2019-09-20 17:08:57.978 EDT [34423] WARNING:  57P02: terminating connection because of crash of another server process
2019-09-20 17:08:57.978 EDT [34423] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
2019-09-20 17:08:57.978 EDT [34423] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2019-09-20 17:08:57.978 EDT [34423] LOCATION:  quickdie, postgres.c:2737
2019-09-20 17:08:57.979 EDT [34418] LOG:  00000: database system is shut down
2019-09-20 17:08:57.979 EDT [34418] LOCATION:  UnlinkLockFiles, miscinit.c:860

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

20 September 2019, 22:03:32

Hi,

On 2019-09-20 17:49:27 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-09-20 16:25:21 -0400, Tom Lane wrote:
> >> I recreated my freebsd-9-under-qemu setup and I can still reproduce
> >> the problem, though not with high reliability (order of 1 time in 10).
> >> Anything particular you want logged?
> 
> > A DEBUG2 log would help a fair bit, because it'd log some information
> > about what changes the "horizons" determining when data may be removed.
> 
> Actually, what I did was as attached [1], and I am getting traces like
> [2].  The problem seems to occur only when there are two or three
> processes concurrently creating the same snapshot file.  It's not
> obvious from the debug trace, but the snapshot file *does* exist
> after the music stops.
> 
> It is very hard to look at this trace and conclude anything other
> than "rename(2) is broken, it's not atomic". Nothing in our code
> has deleted the file: no checkpoint has started, nor do we see
> the DEBUG1 output that CheckPointSnapBuild ought to produce.
> But fsync_fname momentarily can't see it (and then later another
> process does see it).

Yikes. No wondering most of us weren't able to reproduce the
problem. And that staring at our code didn't point to a bug.

Nice catch.


> In short, what we got here is OS bugs that have probably been
> resolved years ago.
> 
> The question is what to do next.  Should we just retire these
> specific buildfarm critters, or do we want to push ahead with
> getting rid of the PANIC here?

Hm. Given that the fsync failing is actually an issue, I'm somewhat
disinclined to remove the PANIC. It's not like only raising an ERROR
actually solves anything, except making the problem even harder to
diagnose? Or that we otherwise are ok, with renames not being atomic?

So I'd be tentatively in favor of either upgrading, replacing the
filesystem (perhaps ZFS isn't buggy in the same way?), or retiring
those animals.

Greetings,

Andres Freund

Re: subscriptionCheck failures on nightjar

From

Alvaro Herrera

Date:

20 September 2019, 22:06:20

On 2019-Sep-20, Tom Lane wrote:

> Actually, what I did was as attached [1], and I am getting traces like
> [2].  The problem seems to occur only when there are two or three
> processes concurrently creating the same snapshot file.  It's not
> obvious from the debug trace, but the snapshot file *does* exist
> after the music stops.

Uh .. I didn't think it was possible that we would build the same
snapshot file more than once.  Isn't that a waste of time anyway?  Maybe
we can fix the symptom by just not doing that in the first place?
I don't have a strategy to do that, but seems worth considering before
retiring the bf animals.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Andres Freund

Date:

20 September 2019, 22:11:22

Hi,

On September 20, 2019 3:06:20 PM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>On 2019-Sep-20, Tom Lane wrote:
>
>> Actually, what I did was as attached [1], and I am getting traces
>like
>> [2].  The problem seems to occur only when there are two or three
>> processes concurrently creating the same snapshot file.  It's not
>> obvious from the debug trace, but the snapshot file *does* exist
>> after the music stops.
>
>Uh .. I didn't think it was possible that we would build the same
>snapshot file more than once.  Isn't that a waste of time anyway?
>Maybe
>we can fix the symptom by just not doing that in the first place?
>I don't have a strategy to do that, but seems worth considering before
>retiring the bf animals.

We try to avoid it, but the check is racy. Check comments in SnapBuildSerialize. We could introduce locking etc to
avoidthat, but that seems overkill, given that were really just dealing with a broken os. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

20 September 2019, 22:17:02

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Uh .. I didn't think it was possible that we would build the same
> snapshot file more than once.  Isn't that a waste of time anyway?  Maybe
> we can fix the symptom by just not doing that in the first place?
> I don't have a strategy to do that, but seems worth considering before
> retiring the bf animals.

The comment near the head of SnapBuildSerialize says

     * there is an obvious race condition here between the time we stat(2) the
     * file and us writing the file. But we rename the file into place
     * atomically and all files created need to contain the same data anyway,
     * so this is perfectly fine, although a bit of a resource waste. Locking
     * seems like pointless complication.

which seems like a reasonable argument.  Also, this is hardly the only
place where we expect rename(2) to be atomic.  So I tend to agree with
Andres that we should consider OSes with such a bug to be unsupported.

Dromedary is running the last release of macOS that supports 32-bit
hardware, so if we decide to kick that to the curb, I'd either shut
down the box or put some newer Linux or BSD variant on it.

            regards, tom lane

Re: subscriptionCheck failures on nightjar

From

Michael Paquier

Date:

21 September 2019, 02:16:59

On Fri, Sep 20, 2019 at 05:30:48PM +0200, Tomas Vondra wrote:
> But even with that change you haven't managed to reproduce the issue,
> right? Or am I misunderstanding?

No, I was not able to see it on my laptop running Debian.
--
Michael

Attachment

signature.asc

Re: subscriptionCheck failures on nightjar

From

Andrew Dunstan

Date:

23 September 2019, 20:12:45

On 9/20/19 6:17 PM, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Uh .. I didn't think it was possible that we would build the same
>> snapshot file more than once.  Isn't that a waste of time anyway?  Maybe
>> we can fix the symptom by just not doing that in the first place?
>> I don't have a strategy to do that, but seems worth considering before
>> retiring the bf animals.
> The comment near the head of SnapBuildSerialize says
>
>      * there is an obvious race condition here between the time we stat(2) the
>      * file and us writing the file. But we rename the file into place
>      * atomically and all files created need to contain the same data anyway,
>      * so this is perfectly fine, although a bit of a resource waste. Locking
>      * seems like pointless complication.
>
> which seems like a reasonable argument.  Also, this is hardly the only
> place where we expect rename(2) to be atomic.  So I tend to agree with
> Andres that we should consider OSes with such a bug to be unsupported.
>
> Dromedary is running the last release of macOS that supports 32-bit
> hardware, so if we decide to kick that to the curb, I'd either shut
> down the box or put some newer Linux or BSD variant on it.
>
>             



Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before
long with an 11-stable instance if that's appropriate.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: subscriptionCheck failures on nightjar

From

Tom Lane

Date:

01 October 2019, 01:40:06

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 9/20/19 6:17 PM, Tom Lane wrote:
>> Dromedary is running the last release of macOS that supports 32-bit
>> hardware, so if we decide to kick that to the curb, I'd either shut
>> down the box or put some newer Linux or BSD variant on it.

> Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before
> long with an 11-stable instance if that's appropriate.

FYI, I've installed FreeBSD 12.0/i386 on that machine and it's
now running buildfarm member florican, using clang with -msse2
(a configuration we had no buildfarm coverage of before, AFAIK).

I can still boot the macOS installation if anyone is interested
in specific tests in that environment, but I don't intend to run
dromedary on a regular basis anymore.

            regards, tom lane