Re: pglogical - logical replication contrib module - Mailing list pgsql-hackers

From Steve Singer
Subject Re: pglogical - logical replication contrib module
Date
Msg-id 569151D5.3060003@ssinger.info
Whole thread Raw
In response to pglogical - logical replication contrib module  (Petr Jelinek <petr@2ndquadrant.com>)
Responses Re: pglogical - logical replication contrib module
Re: pglogical - logical replication contrib module
List pgsql-hackers
On 12/31/2015 06:34 PM, Petr Jelinek wrote:
> Hi,
>
> I'd like to submit the replication solution which is based on the 
> pglogical_output [1] module (which is obviously needed for this to 
> compile).
>
> The pglogical contrib module provides extension which does the 
> master-slave logical replication based on the logical decoding.
>
> The basic documentation is in README.md, I didn't bother making sgml 
> docs yet since I expect that there will be ongoing changes happening 
> and it's easier for me to update the markdown docs than sgml. I will 
> do the conversion once we start approaching committable state.
I am going to send my comments/issues out in batches as I find them 
instead of waiting till I look over everything.


I find this part of the documentation a bit unclear


+Once the provider node is setup, subscribers can be subscribed to it. 
First the
+subscriber node must be created:
+
+    SELECT pglogical.create_node(
+        node_name := 'subscriber1',
+        dsn := 'host=thishost port=5432 dbname=db'
+    );
+

My initial reading was that I should execute this on the provider node.  
Perhaps instead
-----------------
Once the provider node is setup you can then create subscriber nodes.  
Create the subscriber nodes and
then execute the following commands on each subscriber node

create extension pglogical

select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost 
dbname=db port=5432');

-------------------

Also the documentation for create_subscription talks about

+  - `synchronize_structure` - specifies if to synchronize structure from
+    provider to the subscriber, default true



I did the following

test2=# select pglogical.create_subscription(subscription_name:='default 
sub',provider_dsn:='host=localhost dbname=test1 port=5436'); create_subscription
---------------------           247109879


Which then resulted in the following showing up in my  PG log

LOG:  worker process: pglogical apply 16542:247109879 (PID 4079) exited 
with exit code 1
ERROR:  replication slot name "pgl_test2_test1_default sub" contains 
invalid character
HINT:  Replication slot names may only contain lower case letters, 
numbers, and the underscore character.
FATAL:  could not send replication command "CREATE_REPLICATION_SLOT 
"pgl_test2_test1_default sub" LOGICAL pglogical_output": status 
PGRES_FATAL_ERROR: ERROR:  replication slot name 
"pgl_test2_test1_default sub" contains invalid character    HINT:  Replication slot names may only contain lower case
letters,
 
numbers, and the underscore character.


The create_subscription command should check if the subscription name is 
valid (meets the rules that will be applied against the slot command).

I wondered how I could fix my mistake.

The docs say

+- `pglogical.pglogical_drop_subscription(subscription_name name, 
ifexists bool)`
+  Disconnects the subscription and removes it from the catalog.
+

test2=# select pglogical.pglogical_drop_subscription('default sub', true);
ERROR:  function pglogical.pglogical_drop_subscription(unknown, boolean) 
does not exist


The command is actually called pglogical.drop_subscription the docs 
should be fixed to show the actual command name


I then wanted to add a second table to my database. ('b').

select pglogical.replication_set_add_table('default','public.b',true); replication_set_add_table
--------------------------- t
(1 row)

In my pglog I then got

LOG:  starting sync of table public.b for subscriber defaultsub
ERROR:  replication slot name "pgl_test2_test1_defaultsub_public.b" 
contains invalid character
HINT:  Replication slot names may only contain lower case letters, 
numbers, and the underscore character.
FATAL:  could not send replication command "CREATE_REPLICATION_SLOT 
"pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status 
PGRES_FATAL_ERROR: ERROR:  replication slot name 
"pgl_test2_test1_defaultsub_public.b" contains invalid character    HINT:  Replication slot names may only contain
lowercase letters, 
 
numbers, and the underscore character.


I then did

test1=# select pglogical.replication_set_remove_table('default','public.b'); replication_set_remove_table
------------------------------ t
(1 row)


but my log still keep repeating the error, so I tried connecting to the 
replica and did the same

test2=# select pglogical.replication_set_remove_table('default','public.b');
ERROR:  replication set mapping -303842815:16726 not found

Is there any way to recover from this situation?

The documenation says I can drop a replication set, maybe that will let 
replication continue.

+- `pglogical.delete_replication_set(set_name text)`
+  Removes the replication set.
+

select pglogical.delete_replication_set('default');
ERROR:  function pglogical.delete_replication_set(unknown) does not exist
LINE 1: select pglogical.delete_replication_set('default');               ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type casts.

The function is actually pglogical.drop_replication_set , the docs 
should be updated.
(note that didn't fix my problem either but then dropping the 
subscription did seem to work).



I then re-added the default set to the origin and resubscribed my replica



test2=#  select 
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost 
dbname=test1 port=5436'); create_subscription
---------------------          2974019075


I then saw a bunch of
LOG:  worker process: pglogical apply 16542:2974019075 (PID 26778) 
exited with exit code 1
ERROR:  subscriber defaultsub initialization failed during 
nonrecoverable step (s), please try the setup again
LOG:  worker process: pglogical apply 16542:2974019075 (PID 26779) 
exited with exit code 1

in the log but then those stopped and I see

test2=# select  pglogical.show_subscription_status();
show_subscription_status

--------------------------------------------------------------------------------
-------------------------------------------------- (defaultsub,down,test1,"host=localhost dbname=test1 
port=5436",pgl_test2_test1_
defaultsub,"{default,default_insert_only}",{all})
(1 row)


I'm not really sure what to do to 'recover' my cluster at this point so 
I'll send this off and rebuild my cluster and start over.








pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Speedup twophase transactions
Next
From: Tomas Vondra
Date:
Subject: Re: WIP: bloom filter in Hash Joins with batches