Re: subscriptionCheck failures on nightjar - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: subscriptionCheck failures on nightjar
Date
Msg-id CAEepm=1pbie9C_PtojGum7qXAAU1hB8JtA6v_9dQFPgay3PcZg@mail.gmail.com
Whole thread Raw
In response to subscriptionCheck failures on nightjar  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: subscriptionCheck failures on nightjar  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Feb 11, 2019 at 7:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 2019-02-10 23:55:58.798 EST [40728] sub1 PANIC:  could not open file "pg_logical/snapshots/0-160B578.snap": No such
fileor directory
 

<pokes at totally unfamiliar code>

They get atomically renamed into place, which seems kosher even if
snapshots for the same LSN are created concurrently by different
backends (and tracing syscalls confirms that that does occasionally
happen).  It's hard to believe that nightjar's rename() ceased to be
atomic a couple of months ago.  It looks like the only way for files
to get unlinked after that is by CheckPointSnapBuild() deciding they
are too old.

Hmm.  Could this be relevant, and cause a well timed checkpoint to
unlink files too soon?

2019-02-12 21:52:58.304 EST [22922] WARNING:  out of logical
replication worker slots

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: more unconstify use
Next
From: Michael Meskes
Date:
Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG