Re: PostgreSQL with BDR - PANIC: could not create replication identifier checkpoint - Mailing list pgsql-general

From Cameron Smith
Subject Re: PostgreSQL with BDR - PANIC: could not create replication identifier checkpoint
Date
Msg-id CO2PR0801MB22144A06118F31215B6023AEA04A0@CO2PR0801MB2214.namprd08.prod.outlook.com
Whole thread Raw
In response to Re: PostgreSQL with BDR - PANIC: could not create replication identifier checkpoint  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: PostgreSQL with BDR - PANIC: could not create replication identifier checkpoint  (Martín Marqués <martin@2ndquadrant.com>)
List pgsql-general
I'd agree:  most likely a file system problem.  Is there any hope that this file could be re-built?

My current plan is to use bdr_part_by_node_names to remove the failing node and then rebuild it from a fresh backup
(andprobably on a new server). 

Thank you for your help!

Cameron Smith


________________________________________
From: Alvaro Herrera <alvherre@2ndquadrant.com>
Sent: May 19, 2016 2:56 PM
To: Cameron Smith
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL with BDR - PANIC:  could not create replication identifier checkpoint

CAUTION EXTERNAL EMAIL






Cameron Smith wrote:

> t:2016-05-19 01:14:51.668 UTC d= p=144 a=PANIC:  could not create replication identifier checkpoint
"pg_logical/checkpoints/8-F3923F98.ckpt.tmp":Invalid argument 

This line corresponds to the following code in BDR's 9.4.4
src/backend/replication/logical/replication_identifier.c:

    /*
     * no other backend can perform this at the same time, we're protected by
     * CheckpointLock.
     */
    tmpfd = OpenTransientFile(tmppath,
                              O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
                              S_IRUSR | S_IWUSR);
    if (tmpfd < 0)
        ereport(PANIC,
                (errcode_for_file_access(),
                 errmsg("could not create replication identifier checkpoint \"%s\": %m",
                        tmppath)));

This file does not exist in 9.5, but instead we have
src/backend/replication/logical/origin.c which has identical code.

OpenTransientFile calls BasicOpenFile, which in turn calls open() and
propagates the errno.  My manpage doesn't list any possible reasons for
open() to return EINVAL, so I'm at a loss about what is happening here.
Maybe this is a filesystem problem?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
DO NOT open attachments or click on links from unknown senders or unexpected emails





This e-mail and any attachments are intended only for use by the addressee(s) named herein and may contain confidential
information.If you are not the intended recipient of this e-mail, you are hereby notified any dissemination,
distributionor copying of this email and any attachments is strictly prohibited. If you receive this email in error,
pleaseimmediately notify the sender by return email and permanently delete the original, any copy and any printout
thereof.The integrity and security of e-mail cannot be guaranteed. 


pgsql-general by date:

Previous
From: "ktm@rice.edu"
Date:
Subject: Re: Debugging a backend stuck consuming CPU
Next
From: Tom Lane
Date:
Subject: Re: Debugging a backend stuck consuming CPU