Thread: BDR: no free replication state could be found

BDR: no free replication state could be found

From

Selim Tuvi

Date:

08 October 2015, 22:54:15

Hi I am testing BDR functionality with Postgres 9.4. I had went through the bdrdemo example with a 3 node cluster and then tried to set up my own db.

My "max_replication_slots" is set to 6. After getting removing the bdrdemo db I am having trouble starting up the postgres instance unless I increase the value of "max_replication_slots". I get the following error in the log:

"starting up replication identifier with ckpt at 0/28E8250",,,,,,,,,""
"recovered replication state of node 1 to 0/54DDCD0",,,,,,,,,""
"recovered replication state of node 2 to 0/1ECBEA0",,,,,,,,,""
"recovered replication state of node 3 to 0/59FB1C0",,,,,,,,,""
"recovered replication state of node 4 to 0/2AA5320",,,,,,,,,""
"recovered replication state of node 5 to 0/27F2F98",,,,,,,,,""
"recovered replication state of node 6 to 0/59F35A8",,,,,,,,,""
"no free replication state could be found, increase max_replication_slots",,,,,,,,,""

pg_replication_slots is only reporting two slots:

postgres=# SELECT * FROM pg_catalog.pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
bdr_19685_6199712740068695651_1_18817__ | bdr    | logical   | 19685 | deliver | t      |      |         2280 | 0/28EA5E0
bdr_19685_6197393155020108291_1_48609__ | bdr    | logical   | 19685 | deliver | t      |      |         2280 | 0/28EA5E0

How can I get rid of the stale node recovery on startup?

Thanks
-Selim

Re: BDR: no free replication state could be found

From

Craig Ringer

Date:

09 October 2015, 06:05:19

On 9 October 2015 at 06:54, Selim Tuvi <stuvi@ilm.com> wrote:
> Hi I am testing BDR functionality with Postgres 9.4. I had went through the
> bdrdemo example with a 3 node cluster and then tried to set up my own db.
>
> My "max_replication_slots" is set to 6. After getting removing the bdrdemo
> db I am having trouble starting up the postgres instance unless I increase
> the value of "max_replication_slots". I get the following error in the log:
>
> "starting up replication identifier with ckpt at 0/28E8250",,,,,,,,,""
> "recovered replication state of node 1 to 0/54DDCD0",,,,,,,,,""
> "recovered replication state of node 2 to 0/1ECBEA0",,,,,,,,,""
> "recovered replication state of node 3 to 0/59FB1C0",,,,,,,,,""
> "recovered replication state of node 4 to 0/2AA5320",,,,,,,,,""
> "recovered replication state of node 5 to 0/27F2F98",,,,,,,,,""
> "recovered replication state of node 6 to 0/59F35A8",,,,,,,,,""
> "no free replication state could be found, increase
> max_replication_slots",,,,,,,,,""
>
> pg_replication_slots is only reporting two slots:
>
> postgres=# SELECT * FROM pg_catalog.pg_replication_slots;
>                 slot_name                | plugin | slot_type | datoid |
> database | active | xmin | catalog_xmin | restart_lsn
>
-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
>  bdr_19685_6199712740068695651_1_18817__ | bdr    | logical   |  19685 |
> deliver  | t      |      |         2280 | 0/28EA5E0
>  bdr_19685_6197393155020108291_1_48609__ | bdr    | logical   |  19685 |
> deliver  | t      |      |         2280 | 0/28EA5E0
>
> How can I get rid of the stale node recovery on startup?


Can you show the output of

select * from pg_replication_identifiers;

please? On all nodes. Also pg_catalog.pg_replication_slots on the other nodes.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR: no free replication state could be found

From

Selim Tuvi

Date:

09 October 2015, 18:54:07

node: deliver_sing (the problem node):

postgres=# SELECT * FROM pg_catalog.pg_replication_identifier;
 riident |                 riname
---------+----------------------------------------
       1 | bdr_6197393155020108291_1_47458_16385_
       2 | bdr_6199712740068695651_1_16385_16385_
       3 | bdr_6197393155020108291_1_47458_17167_
       4 | bdr_6199712740068695651_1_16385_17167_
       5 | bdr_6199712740068695651_1_18817_17951_
       6 | bdr_6197393155020108291_1_48609_17951_
       7 | bdr_6197393155020108291_1_48609_19685_
       8 | bdr_6199712740068695651_1_18817_19685_
(8 rows)

postgres=# SELECT * FROM pg_catalog.pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin |
restart_lsn

-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
 bdr_19685_6199712740068695651_1_18817__ | bdr    | logical   |  19685 | deliver  | t      |      |         2299 |
0/290AB88
 bdr_19685_6197393155020108291_1_48609__ | bdr    | logical   |  19685 | deliver  | t      |      |         2299 |
0/290AB88
(2 rows)


node: deliver_sf:

postgres=# SELECT * FROM pg_catalog.pg_replication_identifier;
 riident |                 riname
---------+----------------------------------------
       1 | bdr_6199712740068695651_1_16385_47458_
       2 | bdr_6199711219508308907_1_17167_47458_
       3 | bdr_6199712740068695651_1_18817_48609_
       4 | bdr_6199711219508308907_1_17951_48609_
       5 | bdr_6199711219508308907_1_19685_48609_
(5 rows)

postgres=# SELECT * FROM pg_catalog.pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin |
restart_lsn

-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
 bdr_48609_6199712740068695651_1_18817__ | bdr    | logical   |  48609 | deliver  | t      |      |         4744 |
0/5BC0DF0
 bdr_48609_6199711219508308907_1_19685__ | bdr    | logical   |  48609 | deliver  | t      |      |         4744 |
0/5BC0DF0
(2 rows)


node: deliver_lon:

postgres=# SELECT * FROM pg_catalog.pg_replication_identifier;
 riident |                 riname
---------+----------------------------------------
       1 | bdr_6197393155020108291_1_47458_16385_
       2 | bdr_6199711219508308907_1_17167_16385_
       3 | bdr_6199711219508308907_1_17951_17173_
       4 | bdr_6199711219508308907_1_17951_18817_
       5 | bdr_6197393155020108291_1_48609_18817_
       6 | bdr_6199711219508308907_1_19685_18817_
(6 rows)

postgres=# SELECT * FROM pg_catalog.pg_replication_slots;
                slot_name                | plugin | slot_type | datoid | database | active | xmin | catalog_xmin |
restart_lsn

-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
 bdr_18817_6199711219508308907_1_19685__ | bdr    | logical   |  18817 | deliver  | t      |      |         2217 |
0/2B04738
 bdr_18817_6197393155020108291_1_48609__ | bdr    | logical   |  18817 | deliver  | t      |      |         2217 |
0/2B04738
(2 rows)

Thanks
-Selim
________________________________________
From: Craig Ringer [craig@2ndquadrant.com]
Sent: Thursday, October 08, 2015 11:05 PM
To: Selim Tuvi
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: no free replication state could be found

On 9 October 2015 at 06:54, Selim Tuvi <stuvi@ilm.com> wrote:
> Hi I am testing BDR functionality with Postgres 9.4. I had went through the
> bdrdemo example with a 3 node cluster and then tried to set up my own db.
>
> My "max_replication_slots" is set to 6. After getting removing the bdrdemo
> db I am having trouble starting up the postgres instance unless I increase
> the value of "max_replication_slots". I get the following error in the log:
>
> "starting up replication identifier with ckpt at 0/28E8250",,,,,,,,,""
> "recovered replication state of node 1 to 0/54DDCD0",,,,,,,,,""
> "recovered replication state of node 2 to 0/1ECBEA0",,,,,,,,,""
> "recovered replication state of node 3 to 0/59FB1C0",,,,,,,,,""
> "recovered replication state of node 4 to 0/2AA5320",,,,,,,,,""
> "recovered replication state of node 5 to 0/27F2F98",,,,,,,,,""
> "recovered replication state of node 6 to 0/59F35A8",,,,,,,,,""
> "no free replication state could be found, increase
> max_replication_slots",,,,,,,,,""
>
> pg_replication_slots is only reporting two slots:
>
> postgres=# SELECT * FROM pg_catalog.pg_replication_slots;
>                 slot_name                | plugin | slot_type | datoid |
> database | active | xmin | catalog_xmin | restart_lsn
>
-----------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
>  bdr_19685_6199712740068695651_1_18817__ | bdr    | logical   |  19685 |
> deliver  | t      |      |         2280 | 0/28EA5E0
>  bdr_19685_6197393155020108291_1_48609__ | bdr    | logical   |  19685 |
> deliver  | t      |      |         2280 | 0/28EA5E0
>
> How can I get rid of the stale node recovery on startup?


Can you show the output of

select * from pg_replication_identifiers;

please? On all nodes. Also pg_catalog.pg_replication_slots on the other nodes.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR: no free replication state could be found

From

Craig Ringer

Date:

13 October 2015, 04:51:53

On 10 October 2015 at 02:53, Selim Tuvi <stuvi@ilm.com> wrote:
> node: deliver_sing (the problem node):
>
> postgres=# SELECT * FROM pg_catalog.pg_replication_identifier;
>  riident |                 riname
> ---------+----------------------------------------
>        1 | bdr_6197393155020108291_1_47458_16385_
>        2 | bdr_6199712740068695651_1_16385_16385_
>        3 | bdr_6197393155020108291_1_47458_17167_
>        4 | bdr_6199712740068695651_1_16385_17167_
>        5 | bdr_6199712740068695651_1_18817_17951_
>        6 | bdr_6197393155020108291_1_48609_17951_
>        7 | bdr_6197393155020108291_1_48609_19685_
>        8 | bdr_6199712740068695651_1_18817_19685_
> (8 rows)


> On 9 October 2015 at 06:54, Selim Tuvi <stuvi@ilm.com> wrote:

>> "recovered replication state of node 6 to 0/59F35A8",,,,,,,,,""
>> "no free replication state could be found, increase
>> max_replication_slots",,,,,,,,,""

The number of supported replication identifiers (in bdr 9.4) is
controlled by max_replication_slots, hence the error message. This
should be documented; I'll amend the docs appropriately.

https://github.com/2ndQuadrant/bdr/issues/133

The identifiers aren't currently dropped during node part, which
should be changed. It hasn't come up to date because frequent node
addition and removal is something to be avoided, and because most
deployments configure room for more slots than needed to avoid future
restarts.

https://github.com/2ndQuadrant/bdr/issues/134

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services