Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot
Date
Msg-id 20150427144447.GT4369@alvh.no-ip.org
Whole thread Raw
In response to Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Andres Freund wrote:

> On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:

> > 2015-04-24 04:47:12 EDT LOG:  le système de bases de données a été arrêté à
> > 2015-04-24 04:44:37 EDT
> > 2015-04-24 04:47:12 EDT PANIC:  n'a pas pu synchroniser sur disque (fsync)
> > le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
> > 2015-04-24 04:47:12 EDT LOG:  processus de lancement (PID 23180) quitte avec
> > le code de sortie 3
> > 2015-04-24 04:47:12 EDT LOG:  annulation du démarrage à cause d'un échec
> > dans le processus de lancement
> >
> > To restart the server, I have to manually delete the folder in pg_replslot.
> > But then I need to re build the slave. Not very practical for a multi
> > gigabyte database.
>
> Obviously that's not how it supposed to be. I don't have access to a
> windows systems, much less a french one unfortunately.

I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE).  Notice it's in a critical section (hence PANIC) and
isdir=false.  This happens just after the rename() from tmppath to path;
maybe the file is "busy" and could not be renamed?  Anyway the rename
itself didn't fail, and the file (under the new name) could be opened by
fd.c, otherwise the error would say "could not open" instead of "could
not fsync".

There are many other callers of rename() and none of them seem to have
special cases for WIN32 specifically; they all assume it works.  (Some
of them are in turn special cases related to link/unlink).

The vast majority of callers of fsync_fname() are related to logical
decoding, so it seems fair game to assume that that code is missing a
trick or two.

> 2) Check that it's unrelated to any anti-virus software running?

It seems likely that something like this is related.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: Client deadlocks when connecting via ssl
Next
From: Robert Haas
Date:
Subject: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)