Thread: BDR: cannot drop database even after parting the node

BDR: cannot drop database even after parting the node

From
Florin Andrei
Date:
postgres=# SELECT bdr.bdr_version();
     bdr_version
-------------------
  0.9.2-2015-07-24-
(1 row)


I've tested BDR with one database on two nodes and it worked well. I've
created the group on node1 like this:

SELECT bdr.bdr_group_create(
       local_node_name := 'node1',
       node_external_dsn := 'node1 dbname=bdrdemo'
);
SELECT bdr.bdr_node_join_wait_for_ready();

I've then joined node2 like this:

SELECT bdr.bdr_group_join(
       local_node_name := 'node2',
       node_external_dsn := 'host=node2 dbname=bdrdemo',
       join_using_dsn := 'host=node1 dbname=bdrdemo'
);
SELECT bdr.bdr_node_join_wait_for_ready();

I did a variety of transactions both ways, cross-checked the nodes,
everything was fine.

Then, from node1, I've parted node2 like this:

SELECT bdr.bdr_part_by_node_names('{node2}');

And then also on node1 I've parted node1 like this:

SELECT bdr.bdr_part_by_node_names('{node1}');

Now I want to start over with a clean slate, so I want to drop the
bdrdemo database on node1. But I can't:

postgres=# DROP DATABASE bdrdemo;
ERROR:  database "bdrdemo" is being accessed by other users
DETAIL:  There is 1 other session using the database.
postgres=# SELECT pid FROM pg_stat_activity where pid <>
pg_backend_pid();
   pid
-------
  10259
  10260
(2 rows)

# ps ax | grep -e 10259 -e 10260 | grep -v grep
10259 ?        Ss     0:00 postgres: bgworker: bdr supervisor
10260 ?        Ss     0:00 postgres: bgworker: bdr db: bdrdemo

If I kill those workers and then drop the database, the workers get
respawned, and then the logs fill up with complaints from the workers
that they can't find the bdrdemo database.

Is there a way to stop BDR completely, so that those workers are laid to
rest and never respawn?

Basically, how do I reset BDR completely? It seems to retain the memory
of the bdrdemo database somewhere.

--
Florin Andrei
http://florin.myip.org/


Re: BDR: cannot drop database even after parting the node

From
Florin Andrei
Date:
With all nodes parted from the group, I've tried to remove the bdr
extension, or the table I've used for tests, but neither works:

bdrdemo=# DROP EXTENSION bdr;
ERROR:  No peer nodes or peer node count unknown, cannot acquire DDL
lock
HINT:  BDR is probably still starting up, wait a while

bdrdemo=# DROP TABLE t1bdr;
ERROR:  No peer nodes or peer node count unknown, cannot acquire DDL
lock
HINT:  BDR is probably still starting up, wait a while


--
Florin Andrei
http://florin.myip.org/


Re: BDR: cannot drop database even after parting the node

From
Florin Andrei
Date:
This procedure seems to work for dismantling the replication cluster
after I'm done and cleaning up test databases. If there might be any
issues with these steps, please let me know. Thanks.


-- Disconnect node2 from cluster
--
-- on node1 run:
SELECT bdr.bdr_part_by_node_names('{node2}');
-- on node2 run:
BEGIN;
SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
SECURITY LABEL FOR 'bdr' ON DATABASE bdrdemo IS '{"bdr": false}';
COMMIT;
-- On node2 restart PG service.
-- Now it's disconnected.
-- To drop the test database on node2, run:
DROP DATABASE bdrdemo;


-- Convert node1 to standalone, then clean slate.
--
-- Show replication slots:
SELECT * FROM pg_catalog.pg_replication_slots;
-- Drop any slots listed there:
-- SELECT pg_drop_replication_slot('slot_name');
TRUNCATE TABLE bdr.bdr_nodes;
TRUNCATE TABLE bdr.bdr_connections;
BEGIN;
SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
SECURITY LABEL FOR 'bdr' ON DATABASE bdrdemo IS '{"bdr": false}';
COMMIT;
-- Restart PG service.
-- To drop test database:
DROP DATABASE bdrdemo;


--
Florin Andrei
http://florin.myip.org/


Re: BDR: cannot drop database even after parting the node

From
Craig Ringer
Date:
On 17 September 2015 at 06:15, Florin Andrei <florin@andrei.myip.org> wrote:

> Then, from node1, I've parted node2 like this:
>
> SELECT bdr.bdr_part_by_node_names('{node2}');
>
> And then also on node1 I've parted node1 like this:
>
> SELECT bdr.bdr_part_by_node_names('{node1}');

The second step is not necessary. In fact we should detect that case
and ERROR, since it doesn't make sense to issue bdr_part_by_node_names
from a node that's already left, it can't have any effect.

> Now I want to start over with a clean slate, so I want to drop the bdrdemo
> database on node1.

That's because it's still a live BDR instance, just with one node.

> But I can't:
>
> postgres=# DROP DATABASE bdrdemo;
> ERROR:  database "bdrdemo" is being accessed by other users
> DETAIL:  There is 1 other session using the database.
> postgres=# SELECT pid FROM pg_stat_activity where pid <> pg_backend_pid();
>   pid
> -------
>  10259
>  10260
> (2 rows)
>
> # ps ax | grep -e 10259 -e 10260 | grep -v grep
> 10259 ?        Ss     0:00 postgres: bgworker: bdr supervisor
> 10260 ?        Ss     0:00 postgres: bgworker: bdr db: bdrdemo
>
> If I kill those workers and then drop the database, the workers get
> respawned, and then the logs fill up with complaints from the workers that
> they can't find the bdrdemo database.
>
> Is there a way to stop BDR completely, so that those workers are laid to
> rest and never respawn?

You've made a good point. We address shutdown and removal on some
nodes, but not the case where you want to shut down and remove BDR on
a single remaining node.

I've been meaning to write a helper function for this for some time,
but other priorities keep intervening.

Here's the manual process:

BEGIN;
SET LOCAL bdr.skip_ddl_locking = on;
SET LOCAL bdr.permit_unsafe_ddl_commands = on;
SET LOCAL bdr.skip_ddl_replication = on;
SECURITY LABEL FOR bdr ON DATABASE mydb IS NULL;
DELETE FROM bdr.bdr_connections;
DELETE FROM bdr.bdr_nodes;
SELECT bdr.bdr_connections_changed();
COMMIT;

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = current_database() AND application_name LIKE '%): perdb';

DROP EXTENSION bdr;

... then remove 'bdr' from shared_preload_libraries .

> Basically, how do I reset BDR completely? It seems to retain the memory of
> the bdrdemo database somewhere.

Sort-of. What happens in your example is that when you part the nodes,
they're separated and stop communicating. So your second part command
never reaches the remaining node. That's expected and normal, but we
should issue an error when it's attempted to make it clearer to the
user what's going on.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services