Re: BUG #13287: Database corruption - PANIC: could not fsync file "pg_replslot/[Slot]/state": Bad file descriptor - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #13287: Database corruption - PANIC: could not fsync file "pg_replslot/[Slot]/state": Bad file descriptor
Date
Msg-id 20150514172746.GJ9584@alap3.anarazel.de
Whole thread Raw
In response to BUG #13287: Database corruption - PANIC: could not fsync file "pg_replslot/[Slot]/state": Bad file descriptor  (Hillel.Eilat@attunity.com)
List pgsql-bugs
Hi,

On 2015-05-14 10:55:14 +0000, Hillel.Eilat@attunity.com wrote:
> I am in a process of developing an REPLICATION application using PostgreSQL
> 9.4.1 on a "Windows 7" platform.
>
> "Logical Decoding" feature is used for extracting database changes in
> real-time.
> A dedicated replication slot - say [Replication Slot Name] - is used for
> controlling this flow.
> Everything works fine until PostgreSQL service is stopped.
> While an active replication slot exists and PostgreSQL service is neatly
> stopped, further attempt to restart it is responded by:
>
> PostgreSQL - PANIC:  could not fsync file "pg_replslot/[Replication Slot
> Name]/state": Bad file descriptor

> Actually - the database in question is now corrupted / not operational
> anymore.
> Tonight this occurred also after a spontaneous machine shutdown.
>
> This misbehavior now occurs very systematically upon stopping PostgreSQL
> service while active replication slots are defined  there.

This is a known and fixed problem that only occurs on windows
(unfortunately it's not allowed to fsync a readonly file handle on
windows). The next 9.4 release will be made public on the 21st.

Until then there's basically three ways to circumvent the problem:
1) Start the database with fsync turned off, and immediately after start
   turn it on, and reload the config. If you're concerned about the data
   loss window that could theoretically incur you can use initdb -S.
2) Remove the slot by simply removing the directory. Obviously that's
   problematic because the slot's gone in that case.
3) Apply the fix
   (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dfbaed459754e71e01bb0cc90a12802bba3f9786)
   and recompile postgres. Unfortunately that's not that easy on windows.

Hope that helps?

Andres

pgsql-bugs by date:

Previous
From: Pedro Gimeno
Date:
Subject: Re: Prepare/Execute silently discards prohibited ORDER BY values
Next
From: David Gould
Date:
Subject: Re: BUG #13286: Core dumped during pg_terminate_backend call.