Thread: BDR - DDL Locking
First time user here and new to PostgreSQL and BDR so I hope I have the right place.
After this I received:
ERROR: Database is locked against DDL operations
HINT: Node (
6203352813534641995
,
1
,
16387
) in the cluster is already performing DDL
I'm trying to remove the lock so I:
bms=# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+-----------------+---------+----------
relation |
16387
|
11189
| | | | | | | |
6
/
58468
|
3049
| AccessShareLock | t | t
virtualxid | | | | |
6
/
58468
| | | | |
6
/
58468
|
3049
| ExclusiveLock | t | t
bms=# SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
bms-# ON pl.virtualtransaction =
'-1/'
|| ppx.transaction;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | transaction | gid | prepared |
owner | database
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------+----------+-------------+-----+----------+-
------+----------
virtualxid | | | | |
5
/
861
| | | | |
5
/
861
|
20010
| ExclusiveLock | t | t | | | |
|
relation |
16387
|
11201
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | t | | | |
|
relation |
16387
|
11189
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | t | | | |
|
virtualxid | | | | |
6
/
58456
| | | | |
6
/
58456
|
3049
| ExclusiveLock | t | t | | | |
|
relation |
0
|
2671
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | f | | | |
|
relation |
0
|
1262
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | f | | | |
|
relation |
0
|
1260
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | f | | | |
|
relation |
0
|
2672
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | f | | | |
|
relation |
0
|
2677
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | f | | | |
|
relation |
0
|
2676
| | | | | | | |
6
/
58456
|
3049
| AccessShareLock | t | f | | | |
|
(
10
rows)
bms=# select pg_cancel_backend(
20010
);
pg_cancel_backend
-------------------
t
bms=# select pg_cancel_backend(
3049
);
ERROR: canceling statement due to user request
3049 never dies. If I restart postgresql the lock persists.
Server details:
[postgres@klw1129 ~/product/9.4.4/logs] psql -U bms -d bmsPassword for user bms:psql (9.4.4)Type "help" for help.
bms=# select * from bdr.bdr_nodes; node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn | node_init_from_dsn---------------------+---------------+------------+-------------+-----------+-----------------------------------------------------------------+----------------------------------------------------------------- 6203352813534641995 | 1 | 16387 | r | KLW1128 | port=5432 dbname=bms user=bms password=bms host=192.168.180.228 | 6203352897032163158 | 1 | 16387 | r | KLW1129 | port=5432 dbname=bms user=bms password=bms host=192.168.180.229 | port=5432 dbname=bms user=bms password=bms host=192.168.180.228
[postgres@klw1128 ~/product/9.4.4/logs] psql -U bms -d bmsPassword for user bms:psql (9.4.4)Type "help" for help.
bms=# select * from bdr.bdr_nodes; node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn | node_init_from_dsn---------------------+---------------+------------+-------------+-----------+-----------------------------------------------------------------+----------------------------------------------------------------- 6203352813534641995 | 1 | 16387 | r | KLW1128 | port=5432 dbname=bms user=bms password=bms host=192.168.180.228 | 6203352897032163158 | 1 | 16387 | r | KLW1129 | port=5432 dbname=bms user=bms password=bms host=192.168.180.229 | port=5432 dbname=bms user=bms password=bms host=192.168.180.228(2 rows)
My log file on klw1129 (Target of DDL):
2015-10-20 15:28:11 GMTLOG: starting background worker process "bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1,"2015-10-20 15:28:11 GMTERROR: cannot truncate a table referenced in a foreign key constraint2015-10-20 15:28:11 GMTDETAIL: Table "conflict_child" references "conflict".2015-10-20 15:28:11 GMTHINT: Truncate table "conflict_child" at the same time, or use TRUNCATE ... CASCADE.2015-10-20 15:28:11 GMTCONTEXT: during DDL replay of ddl statement: TRUNCATE TABLE ONLY bms.conflict2015-10-20 15:28:11 GMTLOG: worker process: bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1, (PID 28543) exited with exit code 12015-10-20 15:28:16 GMTLOG: starting background worker process "bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1,"2015-10-20 15:28:16 GMTERROR: cannot truncate a table referenced in a foreign key constraint2015-10-20 15:28:16 GMTDETAIL: Table "conflict_child" references "conflict".2015-10-20 15:28:16 GMTHINT: Truncate table "conflict_child" at the same time, or use TRUNCATE ... CASCADE.2015-10-20 15:28:16 GMTCONTEXT: during DDL replay of ddl statement: TRUNCATE TABLE ONLY bms.conflict2015-10-20 15:28:16 GMTLOG: worker process: bdr (6203352897032163158,1,16387,)->bdr (6203352813534641995,1, (PID 28546) exited with exit code 1
My log file on klw1128 (Source of DDL):
2015-10-20 15:26:55 GMTLOG: starting logical decoding for slot "bdr_16387_6203352897032163158_1_16387__"2015-10-20 15:26:55 GMTDETAIL: streaming transactions committing after 0/808BBC8, reading WAL from 0/808BB002015-10-20 15:26:55 GMTLOG: logical decoding found consistent point at 0/808BB002015-10-20 15:26:55 GMTDETAIL: There are no running transactions.2015-10-20 15:26:55 GMTLOG: could not receive data from client: Connection reset by peer2015-10-20 15:26:55 GMTLOG: unexpected EOF on standby connection
[postgres@klw1129 ~/product/9.4.4/logs] psql -U bms -d bms
Password for user bms:psql (9.4.4)Type "help" for help.
bms=# select * from pg_extension; extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition------------+----------+--------------+----------------+------------+---------------------------------------------------------------+--------------------------------- plpgsql | 10 | 11 | f | 1.0 | | btree_gist | 16385 | 2200 | t | 1.0 | | bdr | 16385 | 11 | f | 0.9.2.0 | {16919,16934,16947,16983,17013,17023,17032,17039,17052,17121} | {"","","","","","","","","",""}(3 rows)
How do I recover from this scenario without a rebuild? If I cannot recover from this scenario what caused this?
I'm currently testing postgreSQL and BDR for productization.
Thanks,
Will McCormick
What's the *exact* BDR version? When you say you "attempted to" - what was the outcome? Presumably an ERROR from the TRUNCATE, right? That would roll back the transaction, and in the process abort the DDL lock acquisition attempt. Are you sure replication was working normally prior to this point, with no issues? The global DDL lock isn't a true lock in the sense that it appears in pg_locks, etc. If you roll back the transaction trying to acquire it, or terminate the PostgreSQL backend attempting to acquire it - such as your TRUNCATE - using pg_terminate_backend(...) then it will be removed automatically. If for any reason that is not the case (which it shouldn't be) then restarting the nodes will clear it.
Hey Craig thank you very much for your response.
> When you say you "attempted to" - what was the outcome?
I tried a truncate without the cascade option. After that I tried it with the cascade option. The session just hanged indefinitely at that point. There was no rollback and I was testing on an empty table.
Replication was in a ready state on both nodes and both DDL and DML was replicating.
0.9.2.0 BDR
When you say restarting the nodes. I did restart postgres and this didn't help.
On Wed, Oct 21, 2015 at 8:31 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
What's the *exact* BDR version?
When you say you "attempted to" - what was the outcome? Presumably an
ERROR from the TRUNCATE, right? That would roll back the transaction,
and in the process abort the DDL lock acquisition attempt.
Are you sure replication was working normally prior to this point,
with no issues?
The global DDL lock isn't a true lock in the sense that it appears in
pg_locks, etc. If you roll back the transaction trying to acquire it,
or terminate the PostgreSQL backend attempting to acquire it - such as
your TRUNCATE - using pg_terminate_backend(...) then it will be
removed automatically. If for any reason that is not the case (which
it shouldn't be) then restarting the nodes will clear it.
Will, I saw after replying that there's more detail I missed in your mail, so please see the more detailed reply inline below. On 20 October 2015 at 23:31, Will McCormick <wmccormick@gmail.com> wrote: > First time user here and new to PostgreSQL and BDR so I hope I have the > right place. You do. > I attempted to issues a TRUNCATE TABLE without the cascade option on a > Parent table that had a child FK constraint. I've looked at your logs, and it looks like the TRUNCATE suceeded on the node that was doing the DDL and it was queued for replication. Then, when applying to another node, it failed because there was a foreign key relationship referencing the target table. This is odd, because the way BDR captures TRUNCATEs should prevent that from happening. It uses triggers to capture TRUNCATES and enqueues them for execution. However, I can see upon inspection that the approach used just isn't sufficient to handle FK relationships, and that the current test suite doesn't cover this. I'm going to write a test to confirm what I think is going on, then follow up. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services