Thread: BDR: ALTER statement hanging

BDR: ALTER statement hanging

From
Selim Tuvi
Date:
Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would go through. This time it is just hanging. The statement is this:

alter table pts alter column shot drop not null;

I also tried to add a column but that hangs as well:

alter table pts add column shot_tmp text;

Thanks
-Selim

Re: ALTER statement hanging

From
Selim Tuvi
Date:
And I tried running the same statement on another node, while one node was running it and I got the following:

ERROR:  database is locked against ddl by another node
HINT:  Node (6223770712502831127,1,16389) in the cluster is already performing DDL

Terminating the statement in one node and running it on another results in a hang as well.

-Selim


From: Selim Tuvi
Sent: Thursday, December 03, 2015 4:03 PM
To: pgsql-general@postgresql.org
Subject: BDR: ALTER statement hanging

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would go through. This time it is just hanging. The statement is this:

alter table pts alter column shot drop not null;

I also tried to add a column but that hangs as well:

alter table pts add column shot_tmp text;

Thanks
-Selim

Re: ALTER statement hanging

From
Selim Tuvi
Date:
I stopped the other two nodes and restarted the instance and pg_locks shows the following.

deliver=# select * from pg_locks;
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid  |      mode       | granted |
fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------+-
---------
 relation   |   533136 |    11189 |      |       |            |               |         |       |          | 5/68               | 10229 | AccessShareLock | t       |
t
 virtualxid |          |          |      |       | 5/68       |               |         |       |          | 5/68               | 10229 | ExclusiveLock   | t       |
t
(2 rows)

-Selim


From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Selim Tuvi [stuvi@ilm.com]
Sent: Thursday, December 03, 2015 4:31 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] ALTER statement hanging

And I tried running the same statement on another node, while one node was running it and I got the following:

ERROR:  database is locked against ddl by another node
HINT:  Node (6223770712502831127,1,16389) in the cluster is already performing DDL

Terminating the statement in one node and running it on another results in a hang as well.

-Selim


From: Selim Tuvi
Sent: Thursday, December 03, 2015 4:03 PM
To: pgsql-general@postgresql.org
Subject: BDR: ALTER statement hanging

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would go through. This time it is just hanging. The statement is this:

alter table pts alter column shot drop not null;

I also tried to add a column but that hangs as well:

alter table pts add column shot_tmp text;

Thanks
-Selim

Re: BDR: ALTER statement hanging

From
Andreas Kretschmer
Date:
Selim Tuvi <stuvi@ilm.com> wrote:

> Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres
> version is 9.4.5.
>
> With 0.9.2, I used to be able to issue ALTER statements using psql and it would
> go through. This time it is just hanging. The statement is this:

for ddl-commands all nodes MUST be active in replication, so have you
checked that in pg_replication_slots?



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: BDR: ALTER statement hanging

From
Selim Tuvi
Date:
Yes they seem to be active:

deliver=# select * from pg_replication_slots;
                slot_name                 | plugin | slot_type | datoid | database | active | xmin | catalog_xmin |
restart_lsn 

------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
 bdr_533136_6223770712502831127_1_16389__ | bdr    | logical   | 533136 | deliver  | t      |      |       182302 |
0/9C8A5598  
 bdr_533136_6223800735012265413_1_16389__ | bdr    | logical   | 533136 | deliver  | t      |      |       182302 |
0/9C8A5598  
(2 rows)

Although when I look at bdr.bdr_nodes I see the status as still initializing for the other two nodes, I don't know if
thatcould cause this problem: 

deliver=# select * from bdr.bdr_nodes;
     node_sysid      | node_timeline | node_dboid | node_status |              node_name              |
                           node_local_dsn 
                                         |                                          node_init_from_dsn

---------------------+---------------+------------+-------------+-------------------------------------+---------------------------------------------------------------

-----------------------------------------+------------------------------------------------------------------------------------------------------
 6212648563684174798 |             1 |     533136 | r           | pe-deliverdb-sf-01v | host=pe-deliverdb-sf-01v
port=5432dbname=deliver user=deliver_admin password=xxxxx   | 
 6223770712502831127 |             1 |      16389 | i           | pe-deliverdb-sing-01v | host=pe-deliverdb-sing-01v
port=5432dbname=deliver user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432 dbname=deliver
user=deliver_adminpassword=xxxxx 
 6223800735012265413 |             1 |      16389 | i           | pe-deliverdb-lon-01v | host=pe-deliverdb-lon-01v
port=5432dbname=deliver user=deliver_admin password=xxxxx  | host=pe-deliverdb-sf-01v port=5432 dbname=deliver
user=deliver_adminpassword=xxxxx 

-Selim

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Andreas Kretschmer
[akretschmer@spamfence.net]
Sent: Thursday, December 03, 2015 10:49 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

Selim Tuvi <stuvi@ilm.com> wrote:

> Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres
> version is 9.4.5.
>
> With 0.9.2, I used to be able to issue ALTER statements using psql and it would
> go through. This time it is just hanging. The statement is this:

for ddl-commands all nodes MUST be active in replication, so have you
checked that in pg_replication_slots?



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: BDR: ALTER statement hanging

From
Andreas Kretschmer
Date:

> Selim Tuvi <stuvi@ilm.com> hat am 4. Dezember 2015 um 18:46 geschrieben:
>
>
> Yes they seem to be active:
>
> deliver=# select * from pg_replication_slots;
>                 slot_name                 | plugin | slot_type | datoid |
> database | active | xmin | catalog_xmin | restart_lsn
>
------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
>  bdr_533136_6223770712502831127_1_16389__ | bdr    | logical   | 533136 |
> deliver  | t      |      |       182302 | 0/9C8A5598
>  bdr_533136_6223800735012265413_1_16389__ | bdr    | logical   | 533136 |
> deliver  | t      |      |       182302 | 0/9C8A5598
> (2 rows)
>
> Although when I look at bdr.bdr_nodes I see the status as still initializing
> for the other two nodes, I don't know if that could cause this problem:
>
> deliver=# select * from bdr.bdr_nodes;
>      node_sysid      | node_timeline | node_dboid | node_status |
>              node_name              |
>                                             node_local_dsn
>                                          |
>                                          node_init_from_dsn
>
---------------------+---------------+------------+-------------+-------------------------------------+---------------------------------------------------------------
>
-----------------------------------------+------------------------------------------------------------------------------------------------------
>  6212648563684174798 |             1 |     533136 | r           |
> pe-deliverdb-sf-01v | host=pe-deliverdb-sf-01v port=5432 dbname=deliver
> user=deliver_admin password=xxxxx   |
>  6223770712502831127 |             1 |      16389 | i           |
> pe-deliverdb-sing-01v | host=pe-deliverdb-sing-01v port=5432 dbname=deliver
> user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432
> dbname=deliver user=deliver_admin password=xxxxx
>  6223800735012265413 |             1 |      16389 | i           |
> pe-deliverdb-lon-01v | host=pe-deliverdb-lon-01v port=5432 dbname=deliver
> user=deliver_admin password=xxxxx  | host=pe-deliverdb-sf-01v port=5432
> dbname=deliver user=deliver_admin password=xxxxx
>
> -Selim
>


I think, the state 'i' is the main reason for your problem, because of: "i-
Joining: The node is doing initial slot creation or an initial dump and load".

But i can't tell you why this nodes are in this state.


Regards, Andreas


Re: BDR: ALTER statement hanging

From
Sylvain MARECHAL
Date:
Le 04/12/2015 18:59, Andreas Kretschmer a écrit :
>
>
> I think, the state 'i' is the main reason for your problem, because of: "i-
> Joining: The node is doing initial slot creation or an initial dump and load".
>
> But i can't tell you why this nodes are in this state.
>
>
> Regards, Andreas
>
>
Did-you check the bdr.bdr_connections table?
It should have as many lines as the bdr.bdr_nodes tables.

Sylvain


Re: BDR: ALTER statement hanging

From
Selim Tuvi
Date:
Thanks, I removed the other nodes from bdr.bdr_nodes table, deleted all the bdr_connections and
pg_replication_identifierentries, dropped the pg_replication_slots restarted the instance and then trying the ALTER
statementresulted in: 

ERROR:  No peer nodes or peer node count unknown, cannot acquire DDL lock
HINT:  BDR is probably still starting up, wait a while

The only way I could issue the statement is run the following to convert the node to a standalone instance:

BEGIN;
SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
security label for 'bdr' on database deliver is '{"bdr": false}';
COMMIT;

I am still puzzled as to why the bdr_nodes node_status was reporting "i" when there were no errors in the logs.

-Selim

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Andreas Kretschmer
[andreas@a-kretschmer.de]
Sent: Friday, December 04, 2015 9:59 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

> Selim Tuvi <stuvi@ilm.com> hat am 4. Dezember 2015 um 18:46 geschrieben:
>
>
> Yes they seem to be active:
>
> deliver=# select * from pg_replication_slots;
>                 slot_name                 | plugin | slot_type | datoid |
> database | active | xmin | catalog_xmin | restart_lsn
>
------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
>  bdr_533136_6223770712502831127_1_16389__ | bdr    | logical   | 533136 |
> deliver  | t      |      |       182302 | 0/9C8A5598
>  bdr_533136_6223800735012265413_1_16389__ | bdr    | logical   | 533136 |
> deliver  | t      |      |       182302 | 0/9C8A5598
> (2 rows)
>
> Although when I look at bdr.bdr_nodes I see the status as still initializing
> for the other two nodes, I don't know if that could cause this problem:
>
> deliver=# select * from bdr.bdr_nodes;
>      node_sysid      | node_timeline | node_dboid | node_status |
>              node_name              |
>                                             node_local_dsn
>                                          |
>                                          node_init_from_dsn
>
---------------------+---------------+------------+-------------+-------------------------------------+---------------------------------------------------------------
>
-----------------------------------------+------------------------------------------------------------------------------------------------------
>  6212648563684174798 |             1 |     533136 | r           |
> pe-deliverdb-sf-01v | host=pe-deliverdb-sf-01v port=5432 dbname=deliver
> user=deliver_admin password=xxxxx   |
>  6223770712502831127 |             1 |      16389 | i           |
> pe-deliverdb-sing-01v | host=pe-deliverdb-sing-01v port=5432 dbname=deliver
> user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432
> dbname=deliver user=deliver_admin password=xxxxx
>  6223800735012265413 |             1 |      16389 | i           |
> pe-deliverdb-lon-01v | host=pe-deliverdb-lon-01v port=5432 dbname=deliver
> user=deliver_admin password=xxxxx  | host=pe-deliverdb-sf-01v port=5432
> dbname=deliver user=deliver_admin password=xxxxx
>
> -Selim
>


I think, the state 'i' is the main reason for your problem, because of: "i-
Joining: The node is doing initial slot creation or an initial dump and load".

But i can't tell you why this nodes are in this state.


Regards, Andreas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: BDR: ALTER statement hanging

From
Selim Tuvi
Date:
Yes, bdr_connections had the same number of rows:

deliver=# select * from bdr.bdr_connections;
     conn_sysid      | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid |
conn_is_unidirectional|                            
                     conn_dsn                                                | conn_apply_delay | conn_replication_sets
                                              

---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+---------------------------

-----------------------------------------------------------------------------+------------------+-----------------------
                                             
 6212648563684174798 |             1 |     533136 | 0                 |                    0 |                 0 | f
                 | host=pe-deliverdb-sf-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx   |
   | {default}                                                            
 6223770712502831127 |             1 |      16389 | 0                 |                    0 |                 0 | f
                 | host=pe-deliverdb-sing-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx |
   | {default}                                                            
 6223800735012265413 |             1 |      16389 | 0                 |                    0 |                 0 | f
                 | host=pe-deliverdb-lon-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx  |
   | {default}                                                            
(3 rows)

One other thing I noticed is that the conn_dboid is the same for two of the nodes. Is that normal?

-Selim

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Sylvain MARECHAL
[marechal.sylvain2@gmail.com]
Sent: Friday, December 04, 2015 10:14 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

Le 04/12/2015 18:59, Andreas Kretschmer a écrit :
>
>
> I think, the state 'i' is the main reason for your problem, because of: "i-
> Joining: The node is doing initial slot creation or an initial dump and load".
>
> But i can't tell you why this nodes are in this state.
>
>
> Regards, Andreas
>
>
Did-you check the bdr.bdr_connections table?
It should have as many lines as the bdr.bdr_nodes tables.

Sylvain


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: BDR: ALTER statement hanging

From
Craig Ringer
Date:

​If you're not sure what's going on on a node, look at its logs.

The background worker API and PostgreSQL's lack of autonomous transactions makes it quite challenging for BDR workers to capture logs and expose them to users at the SQL level. So always, if in doubt, examine the log files.

Re: BDR: ALTER statement hanging

From
Selim Tuvi
Date:
Thanks Craig, the problem was that (if I remember correctly) there were absolutely no errors or warnings logged when I issued the ALTER statement. Everything seemed to operate normally except that the execution never completed. Even the fact that the node_status was set to 'i' didn't result in any log messages and the replication was working as it should.

-Selim


From: Craig Ringer [craig@2ndquadrant.com]
Sent: Sunday, December 06, 2015 7:05 PM
To: Selim Tuvi
Cc: Sylvain MARECHAL; pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging


​If you're not sure what's going on on a node, look at its logs.

The background worker API and PostgreSQL's lack of autonomous transactions makes it quite challenging for BDR workers to capture logs and expose them to users at the SQL level. So always, if in doubt, examine the log files.