BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres - Mailing list pgsql-general

From Sylvain Marechal
Subject BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres
Date
Msg-id CAJu=pHTd1YiYJAa20Du+M6+9tB6s329Nbmj58v-jSUM_SbyWKg@mail.gmail.com
Whole thread Raw
List pgsql-general
Hello all,

After uninstalling a BDR node, it becomes not possible to join it again.
The following log appears in loop:
<<<
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14620-1] LOG:  starting background worker process "bdr (6287997142852742670,1,19526,)->bdr (6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14621-1] LOG:  starting background worker process "bdr (6287997142852742670,1,18365,)->bdr (6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14622-1] LOG:  starting background worker process "bdr db: mydb" #local4,support
2016-08-25 10:17:08 [ll101] postgres error [6484]: [14621-1] FATAL:  mismatch in worker state, got 0, expected 1 #error,local4,support
2016-08-25 10:17:08 [ll101] postgres error [6486]: [14622-1] FATAL:  mismatch in worker state, got 0, expected 1 #error,local4,support

>>>
I can not tell how this appends: before removing the node, one of the node was in the 'catchup' state and the lag of data between the 2 nodes was growing, that is why I removed it (the idea was to clean the lagged node and to reattach it again.)


Questions:
* is it possible to recover from this error without restarting postgres
* in case a restart is necessary, how to be sure the postgres restart will work? my fear is that the restart fails, meaning the service will be completely down.

Thanks and regards,
Sylvain

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: corruption in indexes under heavy load
Next
From: Tatsuki Kadomoto
Date:
Subject: Re: incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed