Thread: BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres

BDR: Recover from "FATAL: mismatch in worker state" without restarting postgres

From
Sylvain Marechal
Date:
Hello all,

After uninstalling a BDR node, it becomes not possible to join it again.
The following log appears in loop:
<<<
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14620-1] LOG:  starting background worker process "bdr (6287997142852742670,1,19526,)->bdr (6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14621-1] LOG:  starting background worker process "bdr (6287997142852742670,1,18365,)->bdr (6223672436788445259,2," #local4,support
2016-08-25 10:17:08 [ll101] postgres info [11709]: [14622-1] LOG:  starting background worker process "bdr db: mydb" #local4,support
2016-08-25 10:17:08 [ll101] postgres error [6484]: [14621-1] FATAL:  mismatch in worker state, got 0, expected 1 #error,local4,support
2016-08-25 10:17:08 [ll101] postgres error [6486]: [14622-1] FATAL:  mismatch in worker state, got 0, expected 1 #error,local4,support

>>>
I can not tell how this appends: before removing the node, one of the node was in the 'catchup' state and the lag of data between the 2 nodes was growing, that is why I removed it (the idea was to clean the lagged node and to reattach it again.)


Questions:
* is it possible to recover from this error without restarting postgres
* in case a restart is necessary, how to be sure the postgres restart will work? my fear is that the restart fails, meaning the service will be completely down.

Thanks and regards,
Sylvain